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LIMITING DISTRIBUTIONS OF QUADRATIC AND BILINEAR FORMS'’’ 
By Wiiiiam G. Mapow 


1. Introduction. In a previous paper [15], several generalizations of the 
theorem of Fisher, [6, p. 97] and Cochran, [2, p. 178] on the joint distribution of 
quadratic forms in normally and independently distributed random variables 
were derived. The chief purpose of this paper is a demonstration that the 
Fisher-Cochran theorem and its generalizations are valid in the limit under con- 
ditions completely analogous to those under which the Laplace-Liapounoff 
theorem holds. Applications to the analysis of variance, periodogram analysis 
and multivariate analysis are discussed. 

Our general procedure will be to find algebraic conditions on the matrices of 
quadratic and bilinear forms which enable us to assert that the limiting distribu- 
tions of these forms are those which they would have had if the variables, the 
squares or products of which appear in their canonical forms, had been normally 
and independently distributed.* One thing which makes this possible is the 
fact that many frequently used quadratic and bilinear forms have the same 
rank no matter what may be the number of variables of which they are func- 
tions. For example, the rank of the square of the arithmetic mean, Z, , where 


In _ L Ge, + ee + zn), 
n 


is one for all values of n. In this case the quadratic form, 


is a function of the n variables 2, 22, +--+, In. 

In paragraph 2 we state the vector form of the Laplace-Liapounoff theorem 
and several corollaries. The joint limiting distributions of quadratic and 
bilinear forms are derived in paragraph 3. The final paragraph is devoted to a 
statement of a few applications of the theorems. 


1 Much of this research was done under a grant-in-aid from the Carnegie Corporation of 
New York. 

? The material contained in this paper was presented in part to the American Statistical 
Association, December 28, 1937, and in part to the Institute of Mathematical Statistics, 
December 27, 1938. 

* We shall be chiefly concerned with conditions under which the limiting distributions 
are not themselves normal. If the limiting distributions are normal, then generally under 
the conditions we state, the Laplace-Liapounoff thecrem will have been directly applicable. 
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2. The Laplace-Liapounoff theorem.‘ We shall first state some definitions 
and terminology which will be used throughout the paper. 

If used as subscripts or superscripts, or as indices of summation or multiplica- 
tion, the letters 7, j will take on all integral values from 1 through p, the letters 
u, v will take on all integral values from 1 through n, the letters y, 6 will take on 
all integral values from 1 through m, the letter a will take on all integral values 
from 1 through k, and the letter 6 will take on all integral values from 1 through 
k-1, unless explicit statement to the contrary is made. 

The totality of all sets of v real numbers will be denoted by R’. Thus R’ ig 
the combinatory product of the spaces R’, R’, --- , R’, (v times). 

If 2, ---,2, are random variables, and if A is a proposition concerning 
Y1,---,2n, then by P{A} we shall mean “the probability that A.” The 
distribution function of the random variables 2, --- ,2, will be denoted by 
F(a, +--+ ,2n), i.e. 


Pig. --- Zn) = Pia, < 21,-++,2n < tn} 


for all sets of n real numbers. Thus F will have an operational meaning in 
this paper. 


If A(zi, --- , Zn) is a function of 2, --- , 2, defined on R” and measurable’ 
with respect to F(a, --- ,2%n), then E{A(a,--- ,2n)} will be defined by the 
equation, 


E{A(n, «++ ,2%)} = [ a, oes , In) GF (Xt, «++ , tn), 


where the integral is a Lebesgue-Stieltjes or Radon integral. Hence 


| A(t: , «++ , 2n) | is assumed to be integrable with respect to F(a, --- , Xn). 
If Q(y1,--- , Yp) is a single valued measurable function of y:, --- , yp on 
R’, and if y; is a real single valued Borel measurable’ function of 2, --- , 2s 


on R”, then upon substituting for y,,--- , yp it is seen that Q(y, --- , yp) 


4 Although the theorems will be stated in terms of probability distributions, Borel 
measurability, and Lebesgue-Stieltjes integrability, it may simplify the reading if the 
words ‘‘probability distributions’’ are replaced by probability densities or statistical 
distributions, ‘‘Borel measurability’”’ are replaced by continuity, and ‘‘Lebesgue-Stieltjes 
integrability’ are replaced by Riemann integrability. 


5 A function A(z, , ... , Zn) defined on F” is said to be measurable with respect to a distri- 
bution function F(z, , ... , tn) if the set E(t) of all x, ... , znsuch that A(m, ...,2n) <t 
is such that [ dF (2, ... , 2n) is defined for all ¢. 

E(t) 


6 All subsets of R* which may be formed from the totality of intervals of R" by repeated 
summations or multiplications of not more than a denumerable number of intervals vi 
R*, and R* itself, constitute the totality of Borel sets of R". The function y(21, ... , Zn); 
defined on R*, is a Borel measurable function of z,, ... , tn on R* if the set of values of 
2%, .-., 2%, such that y(z,, ... , tn) < tis a Borel set for all ¢t. The class of continuous 
functions is contained in the class of Borel measurable functions. For further details, 
see [3, chs. 1, 2], [11, ch. 3] and [17, chs. 1, 2, 3]. 


eae ow 47 


=o bee 


LIMITING DISTRIBUTIONS 127 


is a single-valued measurable function, A(z, --- ,2n) of 21, ---,2, on R”. 
If 21, °°: ,%, are random variables, then y,,--- , yp are random variables, 
and’ 

(2.1) E{Q(y, +++ , Yp)} = E{A(m,--- ,%n)}. 


We shall call E(z;) the mean value of 2; , o;; the covariance of 2; and 2;, 
and o;; or oj the variance of x; , where o:; = E{(x; — Ex,)(«; — Ex;)}. 

The Laplace-Liapounoff, or Central Limit theorem states conditions under 
which linear functionsof random variables have anormal limiting distribution. 
The general cha-acteristic of the proofs of the theorem is that conditions are 
placed on the random variables so that they may virtually be assumed to be 
bounded. The Lindeberg’ condition, which we shall use, is perhaps the least 
restrictive of all the conditions which require finite means and variances. 

The Lindeberg condition’, ©, : A set of random variables x;,, will be said to 
satisfy the Lindeberg condition &, if there exists, for any preassigned positive 
real numbers 6 and e¢, a positive integer m such that if n > mm, then 


= | Z5n GF (Zin, ore » Seen) < 6, 
y |Z»n|>€ 


where 

Zn = Zion + Laon t +++ + Lon 
and 

Gin + Sian + +++ + Ginn = 1. 
If 


Zion = —* where si, = on +--+ + oi, 
in 
and the 2;,, satisfy &, then we shall say that the z,, satisfy £, . 
Suppose that the random variables yi, --- , ¥pm, have a normal multivariate 
distribution with zero means and with covariance parameters oj, js where 


Cis = Elyinyss), y = 1,---,me 56 = 1,---,m;, 


and denote the distribution function of yn, --- , Ypm, by N(y). Then we may 
state the Laplace-Liapounoff theorem as: 


7It is noted that Q(y,, ..., yp) is integrated with respect to F(m,..., yp) and 
A(z, ... , Zn) is integrated with respect to F(z, ... , Zn). 

® See Cramer [3, pp. 57, 60, 114], and the references there given. 

* It is not difficult to show that the Lindeberg condition will be satisfied if moments of 
order greater than two exist, [3, p. 60], or if the conditions stated by Levy [13, p. 207} 
and [14, p. 106] are satisfied. 
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TuroreM I. Suppose that, for each value of n, the random variables Zin, , 
which are independent for different values of v, have zero means and covariance 
parameters oi jun , Where 


Civjiion = E (Xion javn)- 


Denote by d,, the maximum of the variances Giyiywn. If the functions yi4_ are 
defined by the equations 


Yiyn = 7 Livny 
it follows that 
Ciyjsn = E(Yyiyn Yian) — z. Ciyjivn- 


. . . / . —— 
Tf lim 644 j8n = Fiy;s and af lim d, = 0, then a necessary and sufficient condition 


that as n — o, the limiting distribution” of yun, «++ , Ypm,n be N(y) is that the 
condition Lym, be satisfied. 

The proof of this theorem is omitted. It may readily be developed from the 
proofs of Cramer, [8, pp. 57, 113]. 

Before stating certain corollaries which are of interest, some additional 
definitions are necessary. 

Let C,, Cnii, --- be a sequence of m rowed real matrices 


Ca = [Learn Il n= m,m+1, 0, 


and let the greatest of the absolute values of the elements of C,, be denoted by 
d,. The inner product of any two rows of C, will be denoted by pysn, i.e. 


Pyin = Z. C+vn Coun 
¥ 


Let X,, X2, --- be a sequence of random vectors of p components defined 
on R’, and let the components of X, be denoted by t,---,2%p,. Let the 
components of the chance matrix Y, = || yiyn || which has p rows and m columns, 
be defined by the equations 


(2.2) Yiyn = Zz. Cyn Viv 


for each value of n, (n = m,--- 5m > p). 





10 The distribution functions F(X,) will be said to converge to the distribution function 
F(X) if and only if 


lim [° dF(X,) = F(X) 


a-7e0 Jon 


for every X at which F(X) is continuous. If F(X) is continuous throughout R*, then the 
convergence is uniform. 
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Suppose that 


(2.3) E(z) = 0 
and 
(2.4) E(xind jn) = cidy , 


where 6, = 1 if »w = vand 6, = Oif uw # v. (There should be no confusion of 
this use of the letter 6 with its use as an index.) It is easy to see that if the 
Cyn are real numbers, then 


E(yiyn) = 0 
and 
E(YiynY itn) = FipPren - 


Let the determinant of the positive definite symmetric matrix, (¢) = || o;; || 
be denoted by o. Let the inverse matrix of (c) be denoted by (¢)* = || o’ || 


where o” is the cofactor of o;; in (¢) divided by ¢. The determinant of (¢)™ 
. 
is o 

By Na(x1, --- , Zp; (o)) we shall mean the normal probability density with 


zero means and covariance parameters o;;, i.e., 


Nala, «++, 2p; (o)) = (24a) * exp [—3 ) » ox; 2;], (-0o <4;3< 0), 
457 


where (c) is a positive definite matrix. If the random variables 2, --+ , 2» 
have probability density Na(X; (c)) = Na(a, ---,2p3;(c)), where X is a vector, 
then we shall say that X has a distribution function N(X; (c)), i.e. 


a” Z . 
da, 2s an, N Xi ()) = NalX; (0) 


or 
fc cas [ Nalth, «++, tp; (o)) dt; --- dtp = N(X; (0)). 


Inasmuch as certain hypotheses will be used on several occasions in this 
paper, they are stated here. 

If 2, , a2, --- are independently distributed, if (2.3) and (2.4) hold and if 
the x’s satisfy the condition &, then we shall say that KH, is true. 

If C, is such that, for all n, the equations p,s3, = 5,3 are true, we shall say 
that € is true. 

The following corollary is useful in deriving limiting distributions in the 
analysis of variance. 

Corrotuary I. Let K,.and C be true. Then a sufficient condition that 


lim F(Y,) = IT Nw, ++) Ypy; (o)) 


is lim d, = 0. 


no 
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The proof is based on the fact that the z;,,, of Theorem I are given by Cy-z,, . 
The details are omitted. 

The pm rowed square matrix, (r) = ||’Tne || is defined as follows: If r < m, 
8 < m; then tre = oupre; and if km <r < (k + 1)m, lm < 8 < (1+ 1)m, 
Lk = 0,---, p — 1, then trp = o%41141Pr-km s—in. The inverse matrix of 
(7), and the determinants of (r) and (r)~ are defined as are (¢)", o and o”, 

Corouiary II. Let KH, be true, and let 


lim pn = Py, Pyy = 1. 


no 


Then, if lim d, = 0, it follows that 


lim F(Y,) = F(Y), 


n->o 


where F(Y) is the distribution function determined by the probability density 


pm pm 
(Qn) 273 exp | -3 ze T Yes r—km Yl+1 | 
where, ifr << m,s <m, then k = 0,1=0;7f r< m, m<s < 2m, then k = 0, 
l = 1; and so on. 

The proof is omitted. 

If Z,,---,Z: are random variables, then F(X,,--- ,Xx|Z1,--- ,Z) is 
the distribution function of the random vectors X,, --- , X; for fixed values of 
Zi,---,Z, i.e. for any fixed values of Z,,---,Z:, 


ie 6H 2 Oo PM, «- RA, «-- 


We shall now assume that the elements c,,, of the matrix C,, are Borel measur- 
able functions of a set of random variables" Z,,---,Z:,. Then the matrix 
C, may be called a random matrix defined on a space W,, which is the combina- 
tory product of the spaces on which Z,, --- , Z;, are defined. If, for each value 
of n, and for all X” and Z", the equation 


(2.5) F(X", 2") = F(Z")- II F(X, |Z") 


is satisfied, then we shall say that Jis true. It is obvious that sufficient condi- 
tions for the truth of J are 


F(X", Z") = F(Z")-[] F(X,) 
or, ift, > n 


F(X", Z") _ F(Za+, “—s Z.,)-II F(X,, Z,) 


11 The symbol X* will stand for the set of variables X,, ... , X,, and the symbol Z* 
will stand for the set of variables Z,, ... , Z:,. 









li- 
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or, iff. <n 
tn n 

F(X", Z") = [[ F(X,, Z,)- II F(X). 
y=] y=t,t+1 


Inasmuch as we shall often use Fubini’s theorem, it is now stated here.” 

TueorEM II. Let the distribution function of X", Z” be F(X", Z"), let the 
distribution function of X” for fixed values of Z” be F(X” | Z”), and let the distribu- 
tion function of Z” be F(Z"). Then if A(X”, Z") is measurable with respect to 
F(X", Z") and af 













[ | A(X", 2") |dF(X", Z") < @, 
RP"XWa 
it follows that 
[ ace, 2) |aF(X" |Z") < « 
Rpn 


for almost all’* sets of values of Z" and 


[ ini A(X", Z") dF(X", Z") = [ ; | [ , A(X", 2") dF (X" \2") lance, 


In Corollary I an important condition was that the maximum of the absolute 
values of the elements of C,, should approach zero as n increased. In order to 
obtain a similar condition when the elements of C, are random variables, we 
shall define the function d(C,,) as follows: For each value of Z” let d(C,,) be the 
maximum of the absolute values of the elements of C,. We shall denote 
d(C,) by d,. If the elements of C, are Borel measurable functions then d, is a 
Borel measurable function of Z”. Hence d, is a random variable defined on W, . 

A sequence of random variables d; , dz, --- is said to converge in probability 
to zero if, given e > 0, then 


lim P{|d,| > e} = 0. 






If the sequence of functions dp , dpi, -- 
shall say that Z is true. 
If J is true, and if, for almost all values of Z” we have 


converges in probability to zero we 


(26) | udF(X,, 2") = 0, 








(2.7) [ Liv Xj OF (X,, Z") = a4, 
RP 





22 Proofs of Fubini’s theorem with the required amount of generality will be found in 
[5, p. 101) and (14, p. 73). 

13 A proposition concerning random variables is said to be true for almost all values of 
the variables, if it is true for all values of the variables, except perhaps for a set of proba- 
bility zero with respect to the distribution function of the random variables. 
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and the condition &, is satisfied with respect to the X and the distribution fune- 
tions F(X, , Z”) then we shall say that KH’, is true. 
If 





(2.8) > / Coyn Covn Liv jr OF (X,, Z”) = 7:5 548, 
v RPXWa 


then we shall say that Cis true. It is noted that if J and (2.7) are true, then 
C° is true if C is true for almost all sets of fixed values of Z”. 
Corouuary III. Let @°, J and K%, be true. Then, if & is true, it follows that 


lim F(Y,) = II N(y7, 2°) Ypy; (c)). 


Proor. It is necessary to show that the condition &,n is satisfied by the 
variables Cy»2i, if the condition &, is satisfied by the variables x,, and that the 
condition Z implies that lim d, = 0 when the z;,,, of Theorem I are set equal 


n~-?o 


to the Cyt of Corollary ITI. 
If we let Arn = Do (Cyntiv)’, An = Qo Ayn and let s, = E{A%}, then, by (2.8), 
Yt v 


it is true that 


s = Do = mM Don. 
74 4 


From ‘K%, and the fact that for sufficiently large n, |d3,(Z") | < 1 for almost all 
Z” we have for any preassigned «¢ and 4, 


1 1 
* As dF(X", 2") <3 D [ md?.(Z") >. tiydF(X,, Z") <8 
nv An>€8n ‘ 


2 
Sn Aan>€8n 
. . . 2 
for sufficiently large n, since the set of z’s and Z” for which > Livy > €8n CON- 
tv 


tains almost all the z’s and Z” for which A, > e¢s,. Hence, the condition 


Lom is satisfied by the random variables cyn2i, with respect to the distribution 
functions F(X, , Z”). 
We now show that 


lim [max E{(cyn2i)"}] = 0. 


nc 


It is clearly true that 
E{(cyn%iv)’} < [ d?, xi, dF(X», 2"). 

; R?XWa 
Since d, converges in probability to zero, and since d, < 1 for almost all Z, 
we can, for any e > 0, take m so large that if n > m, then P{d;, > 4e} < He. 
If E is the set on which d’, > 3, we then have for all n > no, using (2.7), 

El(cmae)) < J | f earcx,|2% |ar@”) 
E RP 
+ | xi, dF(X, \2") lara” < €0ni 
2 Wa RP 


and this inequality is also satisfied for alln > m. 
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The following discussion is useful in obtaining the litniting distributions of 
statistics which occur in multivariate statistical analysis. 

The letter f will assume all integral values from 1 through s, the letters yu, v 
will assume all integral values from 1 through n;, and the letters y, 6 will assume 
all integral values from 1 through m,, for any f. 

Let X{, --- be, for any fixed f, a sequence of random vectors of py compo- 
nents defined on R”’, and let the set of random variables X{, - - - be independently 
distributed for any fixed f. 

If, for each set of values of m,--- ,”s, (tn isa function of m, --- , Ms), 


F(X1, «++, X%,, 21, «++, Ze) = IL TT P(X |Z, ---, Ze,)-F(Zr, ++, Zea), 
f v 


we shall say that J,, is true. 

Let, for any fixed value of f, the matrix" C4, = || c4,. || where the c’,,, are Borel 
measurable functions of Xi, (k < f), and” Z", have the same properties as 
C, , and let d(C4,) be the same function of C4, that d(C,) is of C,. We shall 
denote d(C4,) by dj, . 

Let 


Youn = a Cn thy 
v 


and let Y= || yen ||. 

For fixed f, the ps rowed square matrix (¢,), its inverse, and so on are defined 
as were the same functions of the o;; earlier in this paragraph but with o;;; 
replacing o;;, where 


E {a4,} = 
and 
E{xi,2%,} = Oijf- 


If J,, is true, and if for almost all values of Z” we have 


(2.9) |, marx, 2”) = 0, 
RP 


(2.10) [ s xi, 2}, dF(X?, Z") = ais, 


and the condition £,, is satisfied with respect to the X/ and the distribution 
functions F(X, Z") then we shall say that 4, is true. 


If 


(2.11) 7 | onchen sty 2, dF(X3, Z") _ ois bys, 


4 The superscripts f and k will nat indicate multiplication but will only be indices. 
16 See footnote 11. 
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then we shall say that C’ is true. It is noted that if J; and (2.10) are true 


then @’ is true if C is true for almost all sets of fixed values of X} eae Zz. 
ae 

If di, converges in probability to zero as n increases we shall say that Sy is 
true. 


Corotuary IV. Let@’, J,and H},,---, H's, be true. Then, if Z1,---,%, 
are true, it follows that 


lim F(Yi,,---,Y3, =I Fv, 
Ff 


Nip? * sR gO 


where 


F(Y’) _ II Nyy, Phew Soest (c;)). 


The proof is almost identical with the proof of Corollary III of which this 
corollary is an extension. 

It is remarked that if the statistics, the limiting distributions of which are 
desired, are associated with the normal distribution, as are most statistics 
studied, then Corollary IV may not be the best tool to use. This is a conse- 
quence of the fact that such statistics are generally expressible as functions of 
uncorrelated random variables and hence are more simply discussed, using 
Corollary I. 





3. Limiting distributions of quadratic and bilinear forms. We first assume 
the coefficients of the forms to be constants. For each set of values of 7, 7, and 
n, the matrix of the bilinear form with coefficients which are real numbers, 

(3.1) i = > Ayn Lip Xjr, 


id 





will be denoted by A, , and the rank of A, will be denoted by m. The maximum 
of the absolute values of the elements of A, will be denoted by b,. We shall 
assume that there exists an orthogonal transformation, 


(3.2) Yiun — Zz. Cuvn Viv, 












of ti, --+ , in Such that 


(3.3) at, = x Ns Yisn Yisn y 


. . 16 
where the coefficients A, are non-negative. 

Lemma I. Jf d, is the maximum of the absolute values of the elements Cyn 
then a necessary and sufficient condition that lim b, = 0 is lim d, = 0. 


n->o no 








16 Our theorems will not be applicable if some of the \g are negative and some are positive. 
However if all the \3 are non-positive then the theorems will remain true. 
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ProoF: From (3.1) it follows that 


Quyn = 7 As Coun Covn- 
6 


Hence, bn > Qyyn > AyCun and | dyn | < dz (>. As). The remainder of the proof 
6 
is obvious. 
The following theorem will be the basis for a large sample analogue of Wis- 
hart’s distribution. 
THEOREM III. Let KH, be true. Then, a sufficient condition that 
lim F(Y,) aa II N(yy , 2°) Ypy ; (c)), 


noo ¥ 


where bi; = 7 As Yisn Yisn zs lim bn = (@. 
6 


no 


Proor. According to Lemma I, the fact that lim b, =.0, implies that 


no 


limd, = 0. The yin are such that C is true. Hence the hypotheses of Corol- 


lary I are satisfied and the theorem is proved. 

Before stating the corollary to Theorem III, we shall prove an obvious lemma 
which is of constant service. 

Lemma II. Let lim F(X,) = F(X) at all points of continuity of F(X), and let 


Jin = 9i(Lin , ee » Bpa), ‘os fie = J(Xin , Sass » Bea) 


be Borel measurable functions of their indicated variables for each value of n, 
(p 2 k), defined on R’. 
Then 


lim F(gin, +++ gen) = F(gi, +++ , gu) 


n-?oO 


ai all points of continuity of F(g: , --+ , gx), where ga = Ga(%i, +--+ , Lp). 
Proor. By (2.1), we have 


(3.4) Ele’ 2'= 901s." °° +2p0)) = Ele‘e =?" , 


where since ga(%1, +--+ ,2p) is a Borel measurable function of 21, --- 

know that gin, --:, gen have a joint distribution function F(gi,, --- 

mh . . 7 r 1(/W ° ° ° 7 , 17 
Then, since lim F(X,) = F(X) at all points of continuity of F(X) we have 


n-?>c 


lim Ele’2'2%(t1""" #7) = BletZtavaler.** zp) 


n-?o 
uniformly in every 4, --- , tp interval since 
| Ble" Zietetere to] Ble Flatter 00 | 


< Sl dF,(X1, ---, Xp) — F(X, ---, Xp) l, 


17 See Cramer, [3, p. 30] and ‘‘Additional Note’’ at the end of the book. 
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where F,(X,,---,X,) stands for F(Xin, ---,Xpn), when X; and Xj, have 
the same numerical values. If follows from (3.4), that 


lim Ele''*"*"] = Ele'S"=%] 


n-?c 











uniformly in every t,, --- , tp interval, and consequently 






lim F(gin; eee » Jen) — F(g., 95 Jk) 


no 






at all points of continuity of F(g:, --- , gx). 
The real valued function Ga(x; n, c) will be defined by the equations 


G.(0;0,c) = 1, (—2x <c< a), 







2c 


and Ga(x; n, c) = 0 otherwise. The function G(x; n, c) will be defined by the 
equation 


Ga(x; n, c) = [P(4n)]? (2c) 2" exp | - =|, O<2< ~m;c>0;n>0), 


G(x; n, c) =| Ga(t; n, c) dt. 
0 


The real valued function Ga(ru, Zi2, --+ , Lpp 3 n, (o)) will be defined by the 
equations 


Ga(0, ---,0;p — 1, (¢)) = 1 


Galan, «++, Zpp3 m5 (o)) = (Qe) PP Po [TT r3(n—i +]? | 2 PO 








-exp [—3 Zz o” xijl, (0 < ru < ©5324; < tx 2;;); (c) is positive definite, 
47 


where | x | is the determinant | x;; | and Ga(au, --- , Zpp 3 n, (o)) = 0 otherwise. 
The function G(r, --- ,Zpp 3 n, (¢)) will be defined by the equation 


Zpp 711 
Gan a » Lpp ) n, (c)) | aa | Galtu , Te” bop > n, (c)) dt; dty2 eee Gen 


We can now state the limiting distribution analogue of Wishart’s distribution. 
Corouuary V. If HK, is true,.if 3 = 1, and if m > p then 









lim F(b%,, b%, -- 


no 


7 » Db») = Glbu , rae » Opp 3m, (c)). 













Proor. The conditions of Theorem III and Lemma II are satisfied. 

Obviously for fixed z, the limiting distribution of bj; is G(b; m, o;:), and if 
i ¥ j, the limiting distribution of b?;/m is the distribution of the covariance of 
az; and 2; in a sample of m independent pairs of observations.” 


18 See Wishart and Bartlett, [1, p. 266). 
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We proceed to the analogue for limiting distributions of one of our generaliza- 
tions of the Fisher-Cochran theorem. It is first desirable to give some addi- 
tional definitions. 

We consider the bilinear forms 


(3.5) bisa _ 2. Qyyn Lip Liv 
wy 


with real coefficients, and we denote the matrix of b7;. by Ax. The rank of 
A® is mg, and the rank of A* is min. If the maximum of the absolute values 
of the elements of A}, , --- , Ax’ is b, , and if there exists an orthogonal trans- 
formation, 


(3.6) Yiun = 7 Cuvn Lis y 


of 2, --+ , Lin Such that 


n 


bija = X As Yisn Yian » 


where 6 assumes all integral values from m, + --- + ma1+ 1 through 
m + --- + m, and X; is non-negative, then it is easy to prove, as in Lemma TI, 
that a necessary and sufficient condition that lim b, = 0 is lim d, = 0, where 


d, is the maximum of the absolute values of the elements Cyn . 
Lemma III. Let m = m + --- + my; and let 


(3.7) 7 bija —_ Zz Liv Liv - 
Then, a necessary and sufficient condition that 
bije = x Yisn Yisn y 


where the real linear functions, Yisn , Of Xi, +++ , Lin are given by (3.6), the linear 
functions (3.6) not now being assumed to be orthogonal, is 


Min =n—- mM. 


Furthermore, the functions (3.6) are orthogonal. 

The proof of this lemma for the case p = 1 is given in [16]. The procedure 
to follow in extending the lemma to the cases where p > 1, is given in [15, p. 
473]. It is noted that this lemma is more general than the lemma in [15] 
inasmuch we we show that the orthogonality of the transformation is a conse- 
quence of our hypotheses and not one of the hypotheses.” 


19 It is noted, however, that the increase in gencrality affects only the necessity not 
the sufficiency of the theorem. 

































138 WILLIAM G. MADOW 


THEOREM IV. Let H,, (3.7) and (3.8) be true for all values of n, and suppose 
that lim b, = 0. Then 


n-?>o 


lim F (yn) sa II N(y, S<'% 5 py } (c)) ’ 
Y 


n~->oOo 


n 
where bija = Zz Yisn Yjsn - 
é 


The proof is omitted. 
Coroutuary VI. If the hypotheses of Theorem IV are assumed, and if mg > p; 
(8 = 1,---,h;h < k), then 


. 1/Ln n 
lim F(btin, CGS y ane Yiht+iny *°* y — 


noo 


m 


= II G(biy , i ae Dopy ; My, (c))- I] N(yy, coe, Ynys (c)). 


y=1 y2ht+l 


If p = 1 in Theorem IV and Corollary VI, we have the large sample analogue 
of the Fisher-Cochran theorem. 

We now discuss limiting distributions of random variables which are bilinear 
and quadratic forms in one set of chance variables for fixed values of other ran- 
dom variables. We consider the coefficients a,,, and aj,, of b7; and bj; to be 
random variables. Hence the matrices A, and Ax are random matrices. 


To be more explicit, let X{, X, --- be a sequence of random vectors, the 

‘ ‘ f ‘ 
random vector X%, having p,; components 21, , °° > + foye , and being defined on 
R”, The set of random vectors X% and Z;,--- , Z:, will be assumed to be 


independent. 
For each value of f the coefficients of the bilinear forms 


ws 


(3.9) bites = 2. Arles iy Liv, (i,j = i, tp py a = i, re , ky) 


uw, v= 1 





will be assumed to be Borel measurable functions of the random vectors 
rl rf-1 

4 6” Oe, «++ pla , 
The matrix of b?/.; is denoted by A E . The rank of A # is mg, and the rank 
. af « 

of Aj’? is mx;n, for all sets of values of the aj/ay except, perhaps, on a set E,, 

which is such that lim P(E,,) =.0. 


nyo 
Let the function b(A%) be defined as follows: 

For each set of values of the X/ and Z let b(A%) be the maximum of the abso- 

lute values of the elements of A 7. . Weshall denote b(A ab by c. . Obviously, 

x. is a Borel measurable function of Xf and Z. Hence 


bf, = b(AY, 


: : > r nyPpyt***+nrgp 
is a random variable defined on W XK R™MTU TPs, 
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For each value of f, and for almost all sets of fixed values of the X* , (h = 
- ,f — 1), we shall assume that there exists an orthogonal transformation, 


(3.10) Yorn, = X Gage, 

of zi1, +++, Zin, Such that” 

(3.11) bias = Do Yarn Vinny» 

where \ assumes all integral values from mi + --- + mars + 1 through 
my +--+ + mas. The coefficients Gong of the linear forms (3.10) are real 


single valued Borel measurable functions of the coefficients a/,,,; of the bilinear 

forms (3.9) for fixed values of the Xt and Z". Let ie be the same function 

of the functions a/,., that tow is of the coefficients of the bilinear forms having 

constant coefficients. Furthermore, let di, be the same function of the matrix 

Ch, = || con, || where m = my + --- + mi,17, that b27 is of Anny - 
Lemma IV. A necessary and sufficient condition that bh, converge in probability 

to zero as n increases is that di, , converge in probability to zero as n increases. 
Proor. Since 


ks—-1 
n f f 
> Oulas a x Cryyuny Cryny ’ 


we have 
kg-l 
(ky = 1)bi,, = a Ques = [hun,] 
and 
Jantar | SQ [chan sls 20 [ehongl'}? < magldn,l, 
where A assumes all integral values from my + --- + maiz + 1 through 
my +--+ + May. The remainder of the proof is obvious. 


In proving Theorem V we shall use a generalization of Lemma IIT which is 
proved in [15, p. 473]. 


TuroreM V. Let KH’), --- H",, be true, and suppose that 
Xu bifas = > Cin Xi, . 
v= 


Then, if bh, converges in probability to zero as n increases and if ms = Nz — Mizn, 
Sor all values of ny , it follows that 


lim F (yin, gL re Y¥pmn,) ni N(yt, aes Yo 57 ; (o’)). 


Missy goo y=1 


The proof is omitted. 


20 It is not necessary that the As be set. equal to one as in (3.11). It is only somewhat 
easier to state the results. 
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Corotiary VII. If mas > py, then 


ky—1 
lim F(bitn, ---, 05%p,k,-18) = I tT G(bugs, ++, borey6f 5 Maz , (o’)). 


Nyy * "yg 


The proof is omitted. 

Finally, let us assume that the vectors X% , for fixed v are uncorrelated and 
for fixed f are independent. By that, we shall mean that E(z/,2%,) = 04554, 
and that for all n the set of random vectors X% are independent for the same or 
different superscripts providing the subscripts are all different. Let us also 
assume that the coefficients of the forms (3.9) are real numbers. Thus we have 
weakened the hypotheses of Theorem V concerning the random vectors, and we 
have strengthened the hypotheses of Theorem V concerning the forms (3.9). 
Inasmuch as we are generally concerned with the limiting distributions of 
statistics which occur in the analysis of the normal distribution, and many such 
statistics have been shown to be invariant under transformations into uncor- 
related random variables," Theorem VI and Corollary VIII will often be 
applicable. 

THEOREM VI. The statement of Theorem V is repeated. 

Coro.tiary VIII. The statement of Corollary VII is repeated. 

Another extension of these theorems may be obtained by allowing all the 
ny to be equal, i.e. my = --- = ns = n, and by putting conditions on the forms 
(3.9) which enable us to say that for fixed 7, f, wand n, the set of random variables 
ConX iy are independently distributed. Theorem I could then be used to obtain 
a very general result. However, except for the case dealt with above, the con- 
dition of independence appears to be rather restrictive, and the theorem is 
omitted. 


4. Applications. We first state the strong law of large numbers and a 

lemma which is very useful in the discussion of limiting distributions. 
A sequence of random variables X,, --- will be said to converge with prob- 

ability one” to a random variable X if 

lim P{|X, —X|<e, 


no 





Xngi — X | «4 wee, |Xnip — X | < e} — i | 


for every value of p > 0, uniformly in p for every positive number e. Upon 
setting p = 1, it is seen that convergence with probability one implies con- 
vergence in probability. 

The strong law of large numbers” asserts that if the independent random 
variables X, X,,--- all have the same distribution function, and if E(X) is 


‘ ; , ] ‘ ‘ 
finite, then the sequence of arithmetic means — >; X, converges with proba- 


a "> 
bility one to K(X). 


21 The regression transformation which yields the uncorrelated variables will be found 
in (15, p. 476, (3.2)]. 
22 See Doob [4, p. 163], and Frechet, [9, p. 228]. 


23 See Doob (4, p. 163], and Frechet, [9, p. 259]. A complete proof is given by Frechet. 
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Hence, if E(x.) = 0 and if o;; is finite, then = >. Le2 jv = 8; jn CONVerges with 
probability one to o;;. Since > (tiv — Zin)(Zjp — Zin) = DS Liv jy — NE ind jn 
where Z;, is the arithmetic mean of ri, --- , Zin , and since Z;, converges with 
probability one to zero, it follows that sij,n = Sijn — ZinEjn Converges with 
probability one to o;;. It is, of course, assumed that the random variables 
tiv , Xj have the same joint distribution function for all values of v, and that 
the random vectors X; , --- are independently distributed. The process of the 
reduction of s; jn to s; jn in the limit, is an example of the possible uses of: 

LemMa V. If o(t:, ---,tp) is a continuous function of t, ---,t», and tf the 
sequence of random variables x; converges in probability, (with probability one) to 
x; which may be a random variable or a constant, then the sequence of random 
variables (tin, «++, Lpn) converges in probability (with probability one) to 
g(Z1, -*-,2p), where some or all of the x’s may be constants. If 2, ---,2» are 
constants then g(t, ---, tp) need only be continuous in the neighborhood of 
Z1,°*+ ,Xp and Borel measurable. 

For a proof of part of this lemma which may be extended to yield the entire 
proof, see, Frechet, [9, p. 178]. 

Using Lemma V it is easy to see that the coefficients r, of least squares 
equations converge with probability one to their 8 values, where the 8 value 
is obtained by substituting o;; for s;;, in the expression for r, assuming, of 
course, independent random vectors which have the same distribution functions. 

Since problems in the analysis of variance may be interpreted as problems in 
least squares the above comments and Lemma V will generally make it possible, 
when determining limiting distributions, to consider the statistics to be func- 
tions of deviations from “true’’ mean functions rather than ‘“‘sample’”’ mean 
functions. 

We shall discuss, briefly, four applications of these results. 

(a). The limiting distribution of the regression coefficient. Letr, , the “sample” 
regression coefficient, be defined by the equation 


z Liv Liv 
n= = 
“Doe ’ 
where z;, and z,, are deviations from arithmetic means. If the random vectors 
(z;, , xj) are independently distributed for fixed 7, 7, with the same distribution 


functions, and if E(z;,) = E(z;) = 0, E(z.2;,) = o:;, then it follows from the 
strong law of large numbers that z. Linz jy/n converges to o;; with probability 


one, and from the Laplace-Liapounoff theorem that _ Lit p/n has a normal 
limiting distribution with mean o;; and variance E{z,2; — o:;)°}. Hence, by 


Lemma V, <n (*, — “#) has a normal limiting distribution with mean zero 


i“ 


no 


. 2 
and variance lim B\n (>, o 2) unless that limit does not exist. 


Fix 
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If the z;, are not random variables then, in order to apply Corollary I with 
p = 1, it is necessary that 


(4.1) lim = 0. 


Lim 
(2 zi)" 
In that case, the limiting distribution of (>. z3,)'-r, is normal with zero mean 


and variance o;;. If (4.1) is not satisfied then there is no assurance, unless 
the z;, are normally distributed, that the limiting distribution of (> zi) 'r, 


is normal. 

(b). The limiting distribution of the analysis of variance ratio. The tests of 
significance which occur in the analysis of variance depend on the ratio of two 
quadratic forms, gin and ge, , the denominator ge, having rank (or degrees of 
freedom) mez, increasing with n, and the numerator qi, having rank m, not 
changing with n, i.e., 
wi Jin Men 

g2n™ , 


where gin + gen + Gan = 2.22 and qs, is a quadratic form of rank ms, which 


n 


° ° ° ° ° 24 ° e 
will be identically zero if nm = m, + me,. Since” gen is expressible as the 
variance of x about a least squares equation it follows from the previous dis- 


cussion and Lemma IV that = converges with probability one to o” under the 
2n 


assumptions that the z, are independently distributed with zero means and 
variances o. Hence the limiting distribution of v, will depend only on the 
limiting distribution of qi, and it will consequently be necessary to consider 
only the matrix of qin, in order to apply Corollary VI with p = 1. For ex- 
ample,” if there are pn independently distributed random variables zj, with 
zero means and variances o° arranged in p blocks of n random variables each, 
then 


ty 


ZA ti, — Zz) =n Zz (Zin — En)” + 2 (ai, — Fin)”, 


where Z;, is the arithmetic mean of 74, --- , Zin and Z, is the arithmetic mean 
of all the z;,. Then 


qin = 2 Dy (Fin — En)”, 
an = - (tin — Ein)’, 
m= ?p=— 1, 


Men = p(n — 1) 





*% This has been proved by Kolodziejezyk, [12, p. 161]. 
26 Other schemes are given in Fisher, [8). 
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and the matrix of qi, may be obtained by substituting for the Z;, and z,. In 
this case it is sufficient to express qin as > a;;8,:S; where S; = > tw, Ai = 
— - 


4.2 


(p — 1)/pn, and, f, 1 # j, ai; = —1/pn, to see that the condition that the 
maximum of the absolute values of the elements of the matrix of qi, approaches 
zero as n increases. Hence, if the 2, satisfy the condition &, the limiting 
distribution of mv, is G(v; p — 1, 1). 

Clearly, if only the rank of qs, increases as n increases, the rank me, of gen 
being constant and if the maximum of the absolute values of the elements of 
the matrix of gen also approaches zero as n increases, then v, will have a limiting 
distribution which is the analysis of variance distribution, and the limiting 
distribution of - on will be the correlation ratio distribution. 

In 2n 

(c). Pertodogram analysis. We need only remark that the linear functions 
which are used in the analysis of the Schuster periodogram”™ meet all the require- 
ments of Corollary I if the z, are independently distributed with zero means and 
constant variances and satisfy the condition £&. Consequently the large sample 
theory of the Schuster periodogram is the same for non-normal as it is for 
normal distributions. 

(d). Multivariate analysis. We shall assume that the random vectors 
X,,--- , (X, has components x1, , --- , Zp»), are independently distributed, that 
(2.3) and (2.4) are satisfied, and that the condition &, is satisfied. For any 
fixed n and a we shall call the determinant D2 of the forms (3.5) a generalized 
sum of squares, and the determinant V2 of the elements b7;./ma a generalized 
variance. We shall say that Dz; and V3 have rank mg and that D; and V; 
have rank ng. If mg is constant, and if (3.7) and (3.8) are true then clearly 
the limiting distribution of Dg is the distribution of the generalized variance 
of mg vector observations” from a normal distribution, with zero means and 
covariance parameters o;;. Under the same conditions, the limiting distri- 
bution of Dg/V; is the distribution of the generalized variance of mg vector 
observations from a normal distribution with zero means and covariance pa- 
rameters 6;;. Many other similar limiting distributions are immediately 
derivable. 

Before completing our discussion of the limiting distributions of statistics 
occurring in multivariate analysis, we shall state a theorem on limiting distri- 
butions which is an obvious generalization of a theorem of Doob, [4, p. 166]. 

Suppose that the random variables g(n)Xin, --- , g(n)X pn have a distribution 
function F(g(n)Xin , --- , g(n)X pn) which is such that 

lim F(g(n)Xin, «++, g(n)Xpn) = F(X1, +++, Xp), 
where F(X,, --- , Xp) is a continuous distribution function, and suppose that 
Xin converges in probability to the real number £;. For example, if 7, = 


** The theory of the Schuster periodogram is given by Fisher [7]. 
*7 See Wilks, [18, p. 476] or Madow, [15, pp. 481, 484]. 
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> 2,/n where E(z,) = 0, E(x?) = 1, and & is satisfied, then Z, converges to 


zero with probability one, and +/nZ, has a limiting distribution which js 
normal with zero mean and unit variance, ‘1.e. 

lim | P{-/nin < 2} — N(X;1)| = 0. 

n->o 

THEOREM VII. Let gs(ti, --- , tp) be a function of t,, --- , tp defined ina 

neighborhood N of §:,--- , &p which, together with its (ky; + 1)-th partial deriva- 
tives is continuous in N. Suppose that k is the least value of rz such that the 
random variables” 


. a wea wp 
ion] Gee — 2b 9B) 
i 0g; 
have a joint limiting distribution function D(a,,---,2%s). Then the random 
variables [g(n)][ys(in , ++ ,Lpn) — vr (Er, «++ , Ep)] have a joint limiting distri- 
bution which is given by D(a, ---,2s). The value ky ts greater than or equal,to 
the minimum value for which not all the partial derivatives of order ks vanish at 
£1 eS Ep > 
The proof is almost word for word that of Doob, the only difference being 
the removal of the specializing words. 
We now consider the limiting distribution of the ratio of generalized sums of 
squares L, which is defined by 
i ae, 
Divs 
where Dj: is the determinant of the forms b7;, + bij: = bi; 41. It has been 
shown that” 


where Yj; , (j = k, k + 1), is a ratio of generalized sums of squares 


n 
ae | bre; 


yn 
ret 
uv} 


Since Yj7;/mj, converges with the probability one to | o;.|/| cu» |, and since, 


(r,s = 1,---,27;u,9 = 1, -++,2— 1; boo; = 2). 


by Corollarv VIII the joint limiting distribution of the mz: » (1 _ a is 
ék+1 


28 See Goursat-Hedrick, [10, p. 107] for a statement of the Taylor expansion of functions 
Ogy( Ei , tee yg Ep) 


of several variables, which we use here, by ae, 


is meant the value of 


OpAGs 5 <:« 5 fe) 
02; 
29 See Madow, [15, p. 485]. 


at the point &,..., &p. 
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Il G(x; ;m, 1) it follows, by Theorem VII, that the joint limiting distribution of 


the ratios of generalized sums of squares 


+ yn 


II G(a;; im, 1) 


and that the limiting distribution of my (1 — Ly) is” 
G(x; pm ,; 1). 


In a following paper, these results will be extended to quadratic forms in 
non-central random variables. 


5. Summary. In Section 2, Theorem I, we stated a very general form of the 
Laplace-Liapounoff theorem based on the Lindeberg condition. In four corol- 
laries, this theorem was shown to provide joint limiting distributions for sys- 
tems of linear forms which are such that the maximum of the absolute values 
of their coefficients converge to zero with an increase in the size of the sample 
if the coefficients are constants, and converge in probability to zero with an 
increase in the size of the sample if the coefficients are themselves random 
variables. It was shown that under certain conditions functions of several 
random variables, which are such that cach function is a linear function of 
certain random variables for fixed values of random variables of lower index, 
also have a normal multivariate limiting distribution. 

These results were extended to include limiting distributions of quadratic 
and bilinear forms in Section 3. The method of extension was to show that 
necessary and sufficient conditions for the existence of systems of linear forms 
satisfying the conditions of Section 2 are provided by rather simple conditions, 
the most important of which is that the greatest of the absolute values of the 
elements of the matrices of the quadratic and bilinear forms approach zero if 
the size of the sample inereases, the ranks of the forms remaining unaltered. 
This led to the theorem that quadratic and bilinear forms having such ma- 
trices have x’, or covariance, or Wishart’s distribution as limiting distributions. 
It was then shown, in Theorem IV, that if the rank of the sum of the matrices 
of the quadratic and bilinear forms is equal to the sum of the ranks of the ma- 
trices, and if certain of these ranks do not change as the size of the sample 
increases, then the system of quadratic and bilinear forms have Wishart’s 
distribution in the limit provided the other conditions are met. These results 


9% A generalization of Wilks’ result, [19, p. 323] to the case where the variates are not 
assumed to have a normal multivariate distribution may readily be obtained. 
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were then extended in Theorem V to one of the cases occurring when the coeffi- 
cients of the forms are themselves random variables. 

Several simple illustrations of the uses of the methods were given in Section 4, 
It was shown that the analysis of the variance ratios, and statistics occurring 
in the theory of multivariate statistical analysis have the same limiting distri- 
butions which they would have had if their variables had been normally and 
independently distributed. 
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ON A TEST WHETHER TWO SAMPLES ARE FROM THE SAME 
POPULATION’ 


By A. Wap’ anp J. Wo.row1Tz 


1. The Problem.’ Let X and Y be two independent stochastic variables 
about whose cumulative distribution functions nothing is known except that 
they are continuous. Let 2, 22, --- ,2%m be a set of m independent observa- 
tions on X and let y; , --- , Yn be a set of n independent observations on Y. It 
is desired to test the hypothesis (the null hypothesis) that the distribution 
functions of X and Y are identical. 

An important step in statistical theory was made when “Student” proposed 
his ratio of mean to standard deviation for a similar purpose. In the problem 
treated by “Student” the distribution functions were assumed to be of known 
(normal) form and completely specified by two parameters. It is clear that in 
the problem to be considered here the distributions cannot be specified by any 
finite number of parameters. 

It might nevertheless be argued that by virtue of the limit theorems of 
probability theory, ‘“Student’s’ ratio might be used in our problem for large 
samples. Such a procedure is open to very serious objections. The popula- 
tion distributions may be of such form (e.g., Cauchy distribution) that the limit 
theorems do not apply. Furthermore, the distributions of X and Y may be 
radically different and yet have the same first two moments; clearly ‘“‘Student’s” 
ratio will not distinguish between two such distributions. 

The Pearson contingency coefficient is a useful test specifically designed for 
the problem we are discussing here, but one which also possesses some disad- 
vantages. The location of the class intervals is to a considerable extent arbi- 
trary. In order to use the x’ distribution, the numbers in each class interval 
must not be small; often this can be done only by having large class intervals, 
thus entailing a loss of information. 


2. Preliminary remarks. Denote by P{X < x} the probability of the rela- 
tion in braces. Let f(x) and g(x) be the distribution functions of X and Y 
respectively; e.g., P{X < x} = f(x). Throughout this paper we shall assume 
that f(x) and g(x) are continuous. 

Let the set of m + n elements 2, ---,2%m and y,--- , Yn be arranged in 


1 Presented to the Institute of Mathematical Statistics at Philadelphia, December 27, 
1939. 


2 Research under a grant-in-aid from the Carnegie Corporation of New York. 
* The authors are indebted to Prof. 8. 8S. Wilks for proposing this problem to them. 
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ascending order of magnitude, and let the sequence be designated by Z, thus: 
Z = &, 22,°++,2min, Where 2% < 2 < +++ < min. (f(x) and g(x) were 
assumed to be continuous. Hence the probability is 0 that z; = 2:4: and there- 
fore we may exclude this case.) Let V = 1, ve, --- , Umyn be a Sequence de- 
fined as follows: v; = 0 if z; is a member of the set 21, --- ,%m and v; = 1 if z; 
is a member of the set y1,---,Yn. It is easy to show that any statistic § 
used to test the null hypothesis should be invariant under any continuous, 
reciprocally one-to-one transformation of the real axis. That is to say, if 
t’ = g(t) is any such transformation, then 


(1) S(a pret ytm,Yiy,se*,s Yn) = S(e(x1), ae »9(Lm), e(y), ees e(Yn))- 


The reason for this requirement on S is the fact that the transformed stochastic 
variables X’ = 9(X) and Y’ = ¢(Y) are continuous and have identical distribu- 
tions if and only if X and Y have identical distributions. Hence S must be 
a function of V only, with the added restriction that S(V) = S(V’), where 
V’ = Umin, Umen-1,°°:,%.- For if S were a function of 2%, --- ,2m, 
Y1,°°*,Yn Which cannot be expressed as a function of V alone, then there 
exists a continuous reciprocally one-to-one transformation t’ = g(t) such that 
(1) is not true. On the other hand, any continuous reciprocally one-to-one 
transformation of the entire line into itself is monotonic and hence either leaves V 
invariant or else transforms it into V’. 


3. Previous resuits. In an interesting paper on this problem W. R. Thompson 
[1] proceeds as follows: Let the sets 21, --- , 2m and y;, --+ , Yn be ordered in 
ascending order of magnitude, thus: Zp, , Zp, ,--+ » Tpm_ ANG Ypi , Yop -°* » Yrh 
where ty, < Zp, < +--+ < Xp, and yp; < Ypp <--> <M yp. Let P{rp, < yp,'} 
denote the probability of the relation in braces under the null hypothesis (f(x) = 
g(x)). This probability is shown to be independent of f(x) and the relation 


(2) P{rtp, < Yoy'} = V(m, n, k, k’) 


holds, where the right member, which is given explicitly by Thompson, is a 
function only of the arguments exhibited. To make a test of the null hypothesis 
with, say, a 5% level of significance, this writer proposes to choose k and k’ 
so that ¥(m, n, k, k’) = .05. The test would then consist of noticing whether 
Lp, < Yp',’ or not. In the former case the null hypothesis is to be considered 
as disproved. ; 

It is clear that this test cannot be very efficient, ignoring as it does so many 
of the relations among the observations. Except under certain rather narrow 
restrictions on the admissible alternatives, for example, that g(x) = f(x + c), 
where c is an arbitrary constant, the test suffers the further defect of not being 
“consistent” in a way which will be discussed below. Hence the test suggested 
by Thompson can scarcely be regarded as a satisfactory solution of the problem. 
This criticism, of course, does not apply to those sections of Thompson’s paper 
which deal with the question of estimating the so-called normal range. 
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4. The statistic U. A subsequence v.41, vsi2, --- , err Of V (where r may 
also be 1) will be called a “run” if v.41 = vege = --+ = Vey, and if ve ¥ e414 
when s > 0 and if vs4, ¥ Usyr41 When s + r < m-+n. For example, V = 
1, 0, 0, 1, 1, 0 contains the following runs: 1; 0, 0; 1,1; 0. The statistic’ U 
defined as the number of runs in V seems a suitable statistic for testing the 
hypothesis that f(x) = g(x). In the event that the latter identity holds, the 
distribution of U is independent of f(x). <A difference between f(x) and g(x) 
tends to decrease U. U is consistent in a sense which will be discussed below. 

In order to derive the distribution of U under the null hypothesis, we first 


! 
note that all the a7 = (= ™*"C,,) possible sequences V have the same 


= m!n! . 
t SS OE 5 i y i = 
probability ( ( a : To see this, consider the sequence V where v; = 0 


(¢ = 1,2,---,m) andy; =1(%@=m+1,m+2,---,m+n). Clearly the 
probability of the sequence is 
we m(m — 1) ---1-n(n —1)---1 © 
q (m+ n)(m+n—1)--- (n+ 1)n(n—1)--- 1 





Furthermore, the probability of any other sequence is equal to the product of 
the factors in the numerator of q taken in a different order, divided by the 
product of the factors in the denominator taken in the same order. The quo- 
tient is, of course, = q. 

Let e be the number of runs in V whose elements are 0 and let e; be the 
number of runs whose elements are 1. Obviously U = @& + e,. Let the runs 
of each kind be arranged in the ascending order of the indices of the v;. Let 7; 
be the number of elements 0 in the j* run of that kind (7 = 1, 2, --- , e&) and 
let 71; be the number of elements 1 in the 7“ run of that kind (7’ = 1, 2, --- , e1). 
The following relations obviously hold: 


(3) > Toi = M, 


e1 


(4) >» ny = Nn, 


ma, 


(5) locacm, lsacn, 


(6) le—ea| <1. 


‘When this paper was already in proof, our attention was called to a paper by W. L. 
Stevens, entitled ‘‘Distribution of groups in a sequence of alternatives,’ Annals of Eu- 
genics, Vol. 9 (1939). There a statistic, which is essentially the U statistic, is proposed 
for a problem different from that considered by us and the distribution of U is obtained 
in a different manner. However, the application of the U statistic for the purpose herein 


described, the proof of consistency and the other results of our paper are not contained 
in it. 








150 A. WALD AND J. WOLFOWITZ 





Hence if U = 2k, then e¢ = e: = k, and if U = 2k — 1, then either e& = k, 
ée =k—lore =k—1,e,=k. Theelement »; of V together with the num- 
bers 791 , To2 , --* 5 Toe) » T11y M12 *** » Tie, , completely determines the sequence V 
whose probability is gq. 

Without loss of generality we may assume that m < n. If U = 2k, 
1<k < ™m, v; = 0, any two sequences of k positive numbers each may consti- 
tute a sequence of 71, --+ , Toe, Ti, -** » Tie, provided only that (3) and (4) 
are satisfied. The number of sequences 71, 702, --- , Tox Which satisfy (3) is 
the coefficient of a” in the purely formal expansion of 




















@+ata'+...)'= (4) 


l-—a 


and hence is ™ C,y-,. Similarly the number of sequences ru, Ti2,--+ 5 Tx 
which satisfy (4) is found to be ""Cy_1. Bearing in mind the case U = 2k, 
v; = 1, we obtain 






2(""Cr-1 ° = 


mint’ 


(7) P{U = 2k} = 





(k = 1, 2, “++, m), 


where the left member denotes the probability of the relation in braces under 
the null hypothesis. In a similar manner we obtain 


(3) P= {U = 2k —1) = Cea Cee + Ces Cry) 


; mtnC ’ 








(k = 2,---,m+1), 













with the proviso that “C, = 0 if a < b. 

We shall now briefly indicate a method of obtaining the mean E(U) and 
variance o (U) of U. For example, E(U) may be obtained by performing 
several summations of the type 


m—1 
(9) z Ge”, 


+=0 


It is easy to verify that the expression (9) is the term free of a in the purely 
formal expansion in a of: 


(10) (m — 1)-(1 + ar?a-(1 + a) 


and hence is 


(11) in ~ 7°"... 
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The other summations required for the mean and variance can be carried out 
inasimilar manner. Weshall omit these tedious calculations. The results are: 


2mn 
(12) E(U) = aaa? * 
2 _ 2mn(2mn — m — n) 
” W) = G+ mR + n= 1) 
The critical region for testing the null hypothesis on a level of significance B 


is given by the inequality U < wp, where uw is a function of m and n such that 
P{U < um} = 8B. 


5. The asymptotic distribution of U7. Let m/n = a, a positive constant. 
Then, asm— o, 


2m 
an ee 


__ fom _ 
(1 + a)* 
THEOREM I. Jf ¢ is any real number, the probability of the relation 


U< = + 2| — | t converges uniformly in t to 


o(U) ~ 


l+a (1 + a)? 


oe : —hw? 
V 2r [. ” 


asm— ©, 

The proof of this theorem is essentially the same as the classical proof that 
the binomial law converges to the normal distribution (see, for example, Fréchet 
[2], p. 89) and it will be unnecessary to give the details. Since the asymptotic 
distribution of the subpopulation of even U is the same as that of odd U, it 
will be sufficient to consider only the right member of (7). Let m’ = m — 1, 
n’ =n—1,andk’ = k—1. Wemake the substitution 


(14) - where a’ 


(15) 


and evaluate the factorials by Stirling’s formula. We shall give here only the 
results of successive simplifications. At each step we shall omit the factors 
free of k or w, since their product may be reconstructed from the final expo- 
nential form. Thus instead of the right member of (7) we can consider the 
expression: 


(16) a a 
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Omitting factors free of k, we get 


1 
&— iI!(m—b)Ih— 1)!(n—b)! 


and by Stirling’s formula, since k and m are both large: 


1 
(18) hl Pk’*) (mm! a ky "—F AD (n! a ar 


(17) 


Now apply (14). We obtain 











(Vere t Fare (—vate + ernie 
Fe 
(19) Si m’ 1 
a/m m' m'w— aria’) 2 
(- wT 20 +a wera) 


mor! m’ 
a’ 1+ea’”’ a(lt+a’)’ 
and again omitting factors free of w, we get 


(1+ pe ( (1+ =) hi ae 
1 Nenweshpreadio fer" 11. ae Ifa’? 
( Vani a! s/m! 
(1 ~ tem sve ET, 
Vm! 


. ; — w : . ww 
Taking logarithms, expanding in powers of val and neglecting terms in mi 





, 
Dividing inside the parentheses by i [ respectively, 


(20) 





and higher orders, the results are 


+1)(4 =e a+ ge) 


—, a! 1+ a’ 1+ a')w* 
(21) pion - i+ a’ eee a + te =) 


i m’ _1 a’(1 + a’)w a(1 + a’)’w 
-( vi a’(1 +’) > Vai Om! ) 

















which equals 


(22) ase + O(m™). 


The proof of the fact that the distribution of w converges uniformly to the 


normal distribution with zero mean and variance c+ can be carried out 
a 

in the same way as the classical proof that the binomial law converges to the 

normal distribution. 
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It is obvious that 
m 
aim eee 
l+a 
Vm 
has the same distribution as w. From this and from the fact that U = 2k or 
2k — 1 Tueorem I follows. 
In using conventional tables of the Gaussian function to make tests of sig- 


nificance on U when m and » are large, the reader is urged not to forget that the 
critical region of U lies in only one tail of the curve. 


w* = 


6. An example. We give here a simple example illustrating the use of the 
statistic U and THEorem I. 

Suppose 50 observations were made on X and 50 observations on Y. Suppose 
further that these observations are arranged in ascending order and that the 7 
element of this sequence is said to have the rank 7. The observations on X 
occupy the following ranks: 1, 5, 6, 7, 12, 13, 14, 15, 16, 17, 19, 20, 21, 25, 26, 
27, 28, 31, 32, 38, 42, 43, 44, 45, 50, 51, 52, 53, 54, 56, 57, 58, 62, 63, 64, 65, 
68, 69, 75, 79, 80, 81, 86, 87, 89, 90, 91, 93, 94, 95. 

The observations on Y occupy the remaining xanks. 

In this case, U = 34. 

For m = n = 50, 

E(U) = 51, 

o(U) = 24.747. 
The probability of getting 34 runs or less when the distribution functions of X 
and Y are continuous and identical is therefore less than 5-10“. 


7. Consistency. We shall say that a test is “consistent” if the probability 
of rejecting the null hypothesis when it is false (i.e., the complement of the 
probability of a type II error, cf. Neyman and Pearson, [3]) approaches one 
as the sample number approaches infinity. In the literature of statistics a 
function of the observations which converges stochastically to a population 
parameter as the sample number approaches infinity, is called a “‘consistent”’ 
statistic. If a test of a hypothesis about a population parameter is made by a 
proper use of a consistent (statistic) estimate of the parameter, the test will 
be consistent also according to our definition, which thus furnishes an extension 
of the idea of consistency to the case where the alternatives to the null hypothe- 
sis cannot be specified by a finite number of parameters. 

It is obvious that consistency ought to be a minimal requirement of any good 
test. It is the purpose of this section to prove that, subject to some slight and 
from the practical statistical point of view, unimportant, restrictions on the 
distribution functions, the test furnished by the statistic U is consistent. 

We shall say that the distribution functions f(z) and g(z) satisfy the condi- 
tion A, if, for any arbitrarily small positive 5, there exist a finite number of 



















154 A. WALD AND J. WOLFOWITZ 






closed intervals, such that the probability of the sum J of these intervals 
is > 1 — 6 according to at least one of the distribution functions f(z) and g(z), 
and such that f(z) and g(x) have positive continuous derivatives f’(z) and 
g(x) in I. 

In all that follows, although m and n are considered as variables, their ratio 
m/n is to be a constant, denoted by a. Let 8 > 0 denote the level of signifi- 
cance on which the test is to be made, so that, if f(z) = g(z), 


(23) P{U < u(m)} = 6 


where the critical region for two samples of size m and n, respectively, is given by 


U < w(m). 
THEOREM II. If f(x) and g(x) satisfy condition A, and if 
(24) S(z) F g(z), 


then 


(25) Lim P{U < w(m)} = 1. 


m-?>o 


The proof of this theorem will be given in several stages. 


Let (L 4 s) and a(2 . we s) denote the mean and variance, respectively, 


of = when X and Y have the distribution functions f(z) and g(x), respectively, 


and the sample numbers are m and n. Let the set 21 --- 2m; Y1--+ Yn be 
arranged in ascending order of magnitude, thus: 

(26) Z = 2%, 22, +++ yZminy 

where 2; < z <--- < Zmin- The sequence 

(27) V = 1,02, +++, Umtn 

is defined as follows: v; = 0 if z; is a member of the set 2 --- 2» and v; = 1 
if z; is a member of the set y: --- yn. 


Lemma 1. [If the following are fulfilled: 


a) f(x) =-0 z <0, 
f(z) =z 0<27<l, 
f(z) =1 z>1. 

b) gz) =0 2z<0, 
g(x) = 1 #2 i. 


c) The derivative g’(x) of g(x) exists, is continuous and positive everywhere in 
the intervalO < x <1. 
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d) k is an arbitrary but fixed positive integer. For every m, tim < tom < 
.++ < tem are a set of k positive integers subject only to the restriction that the 


km 


1 
nd he se — : 
least upper bound y of the sequence = a less than 1 


Then the expected value 


#(If.) LL 


satisfies the inequality 


a0) #(s..) - Wy <0" 


where jm = ae. and a;, (j = 1-+-++ k) is the root of 


m +7 
(29) MA) 5 m + 1g (Arm) = Ajm(m + n) 
and ¢(m) depends only on m and is such that 


(30) Lim g(m) = 0. 


m-?c 


It is easy to verify that the root a,;,, of (29) exists and is unique. 
Proor: It will be sufficient to show that, for any specified set of values of 


Vitm °° * Vicp—iym » Vitcsiia °° * Clie (r ee 9 se. k) 
the conditional probability P{v;,,, = 1} of the relation in braces satisfies the 


inequality 


|_ 9G) pp, 1h | 
(31) ao giana) ~ Prim = 1] < v(m), 


where ¥(m) depends only on m and is such that 


(32) Lim y(m) = 0. 


For each m let 


, 


,? ’ , ’ ’ 
(33) Vin or Viim ’ Viem wa Vice—1)m ? Vi (rti)m ee Vim 


be a fixed sequence whose elements are either 0 or 1. We shall consider the 
conditional probability P{v;,,, = s}, (¢ = 0, 1) of the relation in braces subject 
to the condition that 


(34) ~~, = Yiim 9 (j _ 1, 2, ee (r - 1), (r + 1), (r + 2), ee k). 


Let a and b be two numbers such that 0 < a < b < 1, and let m* be a non- 
negative integer such that m* < m, and m* < [y(m + n)] where [y(m + n)] 
denotes the largest integer < y(m +n). Let Q,(a, b, m*) denote the proba- 
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bility that, if m* observations are made on X and [y(m + n)] — m* observations 
are made on Y, the following conditions will be fulfilled: 

(a) the total number of observations < a’is exactly t-m — 1 

(b) all observations are < b 

(c) if the [y(m + n)] observations are arranged in ascending order and if 
v; = 0 or 1 according as the j* element is an observation on X or on Y, then 











(35) Vim = Vim (j = 1,2,---,r—1}), 
and 
(36) Vim = Viim (G=r+1,r+2..--h) 















It is easy to see that the probability Po of the simultaneous fulfillment of the 
relations (34) and of »;,,, = 0 is given by 


1 pb 
(37) Pp = I [ > Rm (a, b, m*)m'(1 — b)™’ "(1 — g(b))”’ dadb, 
0 #0 m* 
where 
* m n dQ * 
(38) R,.(a, b, m ) = "Cn C[y(nt-n)]—me a (a, b, m , 


(39) 





m’ = m — m*, 


and 





(40) n’ = n — [y(m + n)] + m*. 









Similarly, the probability P; of the simultaneous fulfillment of the relations 
(34) and of v;,,, = 1 is given by 


(41) Py= [f de R,,(a, b, m*) n’g'(a)(1 — b)™ (1 — g(b))”’ dadb. 
Then 


(42) P{v;,. — 0} = Po 
Pdi, 1} P, 


Letm= )>, v; and m = m+n -— [y(m+n)] — m. The variables 
j>lyemtn)] 


(2.2 a Pinas (Z[y(m4n)] = ay), ( 
zero. 
Let Po(e) and P:(¢) denote the values of the right members of (37) and (41), 


respectively, if the integration is restricted to the region where a < ), 
|a — a,,, | < «|b — a,| < € and the summation is restricted to those values 





m™ a(l bisdl ay) 


— — ——__) all converge stochastically to 
m  (— eH) ' ' 
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m’ a(l — ay) 


_- 
n (1 — g(a,)) 
stochastic convergence, for all sufficiently large m 


(43) IPA) —P.|<e -8 =1,2. 


of m* for which <e. Hence, because of the aforementioned 


Since P, > 0, for sufficiently large m, also 


Po(e) Po 


(44) Pi(€) e P, 


7 e 


Since g(x) and g’(x) are continuous in the interval [0, 1] and hence uniformly 
continuous, it 1s clear that 

Pole) a 
Pile) g'(aa,) 


where ¢ is a fixed constant independent of m. From (44) and (45) it follows 
easily that, for any arbitrarily small ¢’, 


(45) < ce, 


Po =m a 
P, 9’ (Ar, m) 


(46) 


for sufficiently large m. 
Po + Pp,’ 
pletes the proof of Lemma 1. 
LemMaA 2. If conditions a, b, and c of Lemma | arc satisfied, then 


een [ q’(x) 
Lim E(~;f;g)=2/ 2a 
— (E55 s) 0 a+ g’(zx) oi 


Since P{v;,,, = 1} 


the required relation (31) follows. This com- 


(48) Lim 


m—> 0 


Proor: Since 


m m 


1.31% . 
+ — DO (v; — v-1) 


m m j=2 


r —1 ? 
] + V1 + Um+n 2 "< 2 b a 
- = vj — — 2, Y-10;, 
m m j=2 j=2 


(49) 


we have from Lemma 1, 


_2 9’ (aim) _ y| ° 
= |= aa X oe + n(m) + n*(y) 


= 2 ag’ (dim) | ; * 
m i le +4. g' (ajm))? + n(m) + n (y) ’ 


(50) 
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where 


(51) 






Lim n(m) = 


m—> co 









Lim n*(y) = 0 
yl 


and a;m is the root of the equation 


(52) 


MA jm + Ng(Ajm) = 7 


(j = 2---m+n). 


From equation (52) it follows that 


(53) 


Lim (Aim a 


m—>oo 


Qj—1)m)(m + ng’ (Aim)) = ] 


uniformly in j. Since y may be chosen arbitrarily near to 1, the required 
result. (47) follows easily from (50). 










; ' ; yy . 
It remains to consider the variance of —. The expression 
m 


+n—1 
1 + V1 + Um+n + 2 = v; 
™m m j=2 5 


, 2 1 : , ; 
differs from — by at most —-, so that its variance converges to zero with m— o, 
a m 


In order to prove (48), it will be sufficient to show that the variance of 


m+n 


w= De Y;-10; 


mM j=2 


(54) 







From Lemma | it follows that 


— E(vyv;)E(vyv.)] < 2(m), 


goes to zero with increasing m. 


(55) —2z(m) < [E(vw,vxv-) 


where Lim | z(m) | = 0, provided only that the integers 7, j, k, | are distinct 


and < y(m+n). The variance of mW is the sum of terms of the type occurring 
in (55). The number of terms for which i, j, k, / are distinct is of the order m’. 
All other terms are of size at most 2 and their number is of the order m. Since 
the number y may be chosen arbitrarily near to 1, the variance of W converges 
to zero with m > o~. 

This proves LEMMA 2. 

Lemma 3. If conditions a, b, and c of Lemma | are fulfilled, and if (24) holds, 


then 
—_ 
r= | - Ta < a 
Let a; < a3 be any two real numbers and designate : 
F(x) be defined as follows: 
F(a) = 0, 
F(x) = (x — ay)bs + F(a), 














(56) 





(57) 
(a; S © Sains 5 t = 1, 2). 










an 
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Let c be defined by 
(58) I'(a3) = c(a3 — a). 
Then it is easy to verify that the maximum of 
a3 F’(x) 
:” =| — — dx 
(59) ay a+ F’(x) 
with respect to b; and be, subject to the restrictions that b; and be be non- 
negative, and that a , a; and ¢ be fixed (¢ > 0), occurs when and only when 
(60) bi = bo = €. 
Now define 
a 
= 9i? 
ad g(Pi;) i g(Pa-w;) 
= 7 


Po; = 0, 


2i 


ic i; 


a 
7 hat |,’ 


(Gj = 1,2,--- 


Repeated application of the result of the previous paragraph easily gives 

(62) S; > S j41 . 

From (24) it follows that there exists a positive integer j’ such that Sj; > Sj-4.. 
Obviously 

1 


(63) ~= 1+ a 


(64) Lim §S; = T. 


ie 


Hence LEMMA 3 is proved. 

Proof of Theorem 11: Let 6; > & > --- > 6; > +--+ be an arbitrary but fixed 
sequence such that lim 6; = 0. For é = 6;, let Ji, +--+ , Tec) be a set of closed 
intervals such that no two intervals have an interior point in common and 
within whieh, by condition (A), f’(r) and g’/(r) exist, are positive, and con- 
tinuous. Let. Zp; be the complementary set. (with respect. to the whole line). 
(It is easy to see that, if condition (A) is fulfilled, such a system can be con- 
structed.) Let Uy(i = 1, 2--- k(j) and Uo; denote, respectively, the runs 
caused by the observations which fall in the intervals Z;, Jo;. Then 


k(j) 


(65) U — dU; — Ug! < 2k). 
i=! 


















160 A. WALD AND J. WOLFOWITZ 


From condition (A) it follows that, with a probability arbitrarily close to 1, for 
sufficiently large m, 


(66) Uo; < 3pmé; , 
1 ; 

where p = max] 1, - |, (j = 1,2 .«..), 
a 

Let fa; <a < bij, 7 = 1,2 --- denote the interval J; , and let m; and n; denote 


the number of observations on X and Y, respectively, which fall in the interval 
- m; n; : Sle is ; ; 
T;. Then and — converge stochastically with increasing m to [f(b:) — f(a;)] 
ii Ml 
and (g(b;) — g(a,)], respectively. 
Within the interval /,(¢@ = 1, 2 --- k) we perform the transformation 











(67) X* =f(X), Y* =f(Y), 










which leaves (°; invariant. n; the relative distribution of X* 


is uniform and the relative distribution of Y* fulfills condition (¢) of Lema 1. 


For fixed m;, 


, : 
Hence from LEMMA 2 we obtain that —- converges stochastically to 
m 


e , is 2/ ( 9) on ( 5) for ( 
(68) Lim E (! < 0) < f(b flai)Nlg(b g a:)| ; 
m lg(b;) — g(a;)] + af f(b;) — f(a;)| 












m—> © 


ean be verified tha 1e sum of the second members in (68) over all values 7 
It be verified that the sum of tl 1 members in (68) over all valu 











is less than or equal to 
a 


From (24) and condition (A) we get that, for sufficiently small 6; , there exists 
at least one interval for which the first member of (68) is less than the seeond 
member. Hence 








2 


9 a 
(69) l+a’ 







where 


(70) s = > Lim B(| ‘sf: a). 
Me 


i=1 mw 
Now take j so large that 
(71) 


where 







3p; < «€, 


(72) 





for 


X* 
a. 


2S 2 


ists 


nd 
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Since — converges stochastically to its expected value, from (65), (66), (70), 
m 


(71), and (72), it follows that, with a probability arbitrarily close to 1, for suffi- 
ciently large m, 


, U 2 
73 =e —€. 
(73) m l+a 
From (23) and THEoREM I we get 
. .  ug(m) 2 
(74) Lim = —, 
m= =o l+a 


THEOREM IT follows easily from (73) and (74). 


8. Remarks on a proposed test. We have already remarked in Section 3 that 
the test proposed by W. R. Thompson is not consistent. ‘To show this, we shall 
give two distribution functions f(x) and g(x) such that, although these functions 
will be very different, the probability of rejecting the hypothesis that they are 
the same will not approach one as the sample number approaches infinity. 

Suppose, to simplify the notation, that the observations have been ordered 
according to size, i.e., that a, <r, < --- <amandy < yo <--- <y,. Sup- 
pose further than m = n, and that the test is to be made on a level of significance 
8> 0. In the right member of (2) we need not exhibit » and shall replace 
k and k’ by k(m) and k’(m) to show the dependence on m. We have, under the 
null hypothesis, 


(75) Pian < Yrrno} = Wim, k(m), k’(m)) = B. 
k(m 


The sequence is bounded, so that there exists a monotonically increasing 


subsequence m,, m2 --- of the sequence of integers 1, 2 --- and a number A, 
0<h < 1, such that 


_ . km 

(76) Lim ~ h, 
It is easy to see that then also 

- ~ km 
(77) Lim ‘(ms) = 


io mM, 


h. 


We shall now assume thatO <A <1. Ifh = Oor1 only a trivial alteration 
will be needed in the argument to follow. Let € and 6 be arbitrarily small posi- 
tive numbers. We now consider two populations, A and B described as follows: 


A) f(z) = g(x) =2 (0<2z< 1), 
B) f(z) =z (O0<2z< 1), 
(xr — ai) (g(ai+41) _ g(ai)) 


g(x) = g(a;) + —— (a; Sx S aiy1;7 = 0,1, ---, 4), 
(ai41 — a) 
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where 
a = 0 g(%) = 0 
a4 =h—2>0 g(a) = 
a2 =h—5 g(d2) = de 
ag3=h+i<1-—é6 g(a3) = as 
a=1-—6 g(as4) = a3 
d5 = 1 g(us) = 1 


The definition of f(x) and g(x) outside the interval 0 < x < 1 is obvious. It 
will be shown that even for such different populations as A and B and for 
samples of size greater than that of any arbitrarily assigned number, the prob- 
ability of rejecting the null hypothesis if B is true will be at most 8B + e. 

Let hi, he, hs denote the number of observations on X which fall in the 
intervals 0 < x < dz, a2 <2 < a3, a3 < x < 1, respectively (m fixed, of course), 
Let hi, h2, h3 be the corresponding numbers for Y. For a fixed m, the prob- 
ability of a set hi, he, hs, hi, h2, h3 is the same whether the sample be drawn 
from the population A or B. From (76), (77), and multinomial law it follows 
that for all sufficiently large m; the probability is at least 1 — ¢ of the occurrence 
of a set hy, he, hs, hi, ho, hs for which zicm,) and Yxrcm,) Will both fall in the in- 
terval ag < x < a3. Furthermore it is obvious that for all samples with fixed 
he, hz the distribution within the interval a, < x < ag is the same whether the 
sample came from the population A or B. Hence even when the sample is 
drawn from the population B, the first member of (75) is < 8 + «. This com- 
pletes the proof of the inconsistency of the test based on (75). 

This test is consistent if the alternatives to the null hypothesis are limited, 
for example, to those where g(x) = f(x + c), c a constant. 





REFERENCES 
{1] Wriit1am R. THompson, Annals of Math. Stat., Vol. 9, (1938), p. 281. 
{2} Maurice FrEcHET, Généralités sur les Probabilités. Variables aléatoires, Paris, (1937). 


[3] J. NEyYMAN AND E. 8S. Pearson. Statistical Research Memoirs. University College, 
London. Vol. 1, (1936). 








CoLuMBIA UNIVERSITY, 
NEw York, N. Y. 


7). 
Be, 


THE SUBSTITUTIVE MEAN AND CERTAIN SUBCLASSES OF THIS 
GENERAL MEAN 


By Epwarp L. Dopp 


1. Introduction. No general agreement has been reached, so far as I know, 
as to what constitutes a mean. A necessary condition which appears to meet 
with general approval is that a single-valued mean of a set of numbers all equal 
to a constant c should itself be equal toc. However, there appears to be some 
valid objection against imposing any other proposed condition as necessary. 

Of course, intermediacy is a condition that suggests itself at once. Indeed, 
in certain mean value theorems in general analysis—such as the First Theorem 
of the Mean for integral calculus, which I mention in Section 3—intermediacy 
is the main feature. 

However, O. Chisini [1] insisted that intermediacy or internality is not the 
chief characteristic of a statistical mean. Rather, a mean is a number to take 
the place, by substitution, of each of a set of numbers in general different. 
Such a mean may well be called a representative or substitutive mean. 


Chisini defined m to be-a mean of 2, 22, --- , Zn, relative to a function F, 
provided that 
(1.1) F(m, m, ---,m) = F(a, te, --- , Xn). 


If, for example, 


(1.2) F(a, %2, +++ ,2n) = Tai = Im’ = nm’, 


the mean m thus obtained is the root-mean-square 
(1.3) m = + [(1/n)Zai]}"”. 


The choice of F, Chisini noted, depended upon the use to be made of the 
mean. 


Suppose now that f(z, t2, --- , Zn) is such a function that one value of 
(1.4) J(z, a ***'y x) = &. 


And suppose that this f is taken as a particular F for (1.1) to determine a mean 
m implicitly; thus 


(1.5) f(m, m,--+ , mM) = f(t, %2, +++ , Tn). 
Then, from (1.5) and (1.4) it follows that one value of 
(1.6) S(t1 , 2, -+- , En) =m. 


And thus f determines the mean m both explicitly and implicitly. 
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It should be noted that the F = S27 in (1.2) is not itself a mean of the 2,. 

If, in (1.2), we take 2} = —2, x2 = 1, 23 = 1, then the double-valued mean 
m = + 2" results. Now —2"”? is internal; ei. —2 < —2'” < 1; but 2)” jg 
external, for 2" > 1 > — 2. But since here =z; = 0, it follows also that the 
standard deviation of —2, 1, 1, is the external mean 2'”. Chisini [1], indeed, 
used the root mean square to show the possibility of external means. External 
means have been noted by other writers, [2-7]. 

It is noteworthy that a number of writers [8-12] have used the condition 
(1.4) (in general, with f single-valued) as one of a set of axioms to 
characterize particular means. Sometimes, this has appeared in weaker form 
as f(l, 1,---,1) = 1. 

This paper will be concerned primarily with the mean of a finite number n, 
of variates, 21, 22, ---,2%n. Possible generalizations will be mentioned briefly 
in Section 8. 

In the conception of the substitutive mean, m, as I have been using it for some 
time, emphasis is laid upon the explicit form for m; and provision is made for 
multiple values. 


DEFINITION OF THE SUBSTITUTIVE MEAN. Let f(a, 22, --- , Xn) be a func- 
tion of n variables, 21 , %2, --- , Xn defined at least for one set of equal values, x; = k. 
If c is any number such that f(c, c, --- , c) is defined, let one value of 


(1.7) S(C, ¢, ++: 


Then f(x: , X2, +++, Xn) will be said to be a substitutive mean of x1, t2, +--+, tn. 
If an original formulation of a problem does not assign to a function a value 
when the variables are all equal, it is sometimes possible to assign such values 
by continuity considerations, such as are commonly used in the “evaluation” 
of indeterminate forms. This will be discussed in Section 6. 
In the following, when the word mean is used, it will designate the substitu- 
tive mean as defined above. 


,c) =¢. 











2. Classification of Means already made. Some general classes of means 
have already been distinguished. One important basis for a classification of 
means is the kind of data to be used. The data may be only qualitatively 
distinguishable. Then numbers may be assigned to qualities. For dealing in 
a very general way with all kinds of data, C. Gini and L. Galvani [13], and 
G. Pietra [14], distinguished between data in rectilineal series, in cyclical series, 
and in unconnected series. These three classes are associated respectively with 
the straight line, the circle, and a regular polyhedron (in three dimensions, the 
regular tetrahedron, and in n dimensions, a polyhedron with n + 1 vertices each 
at the same distance from each of the other n vertices). 

For one definition of the arithmetic mean of a cyclical series, Gini uses the 
center of gravity principle; and this mean is computed with the aid of sines and 
cosines. By mechanical means, such an arithmetic mean of dates —for example, 
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of dates of weddings—as days of a year can be found. On the rim of’a wheel 
delicately suspended and marked off for the 365 days or 366 days of a year, let 
small weights proportional to the number of weddings on a day be placed in the 
spaces assigned to the individual days. Then when the wheel comes to rest, 
the arithmetic mean of the dates will be found at the lowest point of the rim. 
In the special case where the center of gravity of the system is at the center of 
the circle, the mean is indeterminate, or we may say that every day is a mean 
day. 

Also, for cyclical series the arithmetic mean and the median are defined by 
other methods, using such principles as minimizing the sum of the squares of 
deviations or the sum of the absolute deviations. 

The properties of means may be made the basis of a classification, either those 
properties which have been evolved by writers [8-12], [15-18] who have char- 
acterized specific means by sets of axioms, or those properties which seem of 
special importance in making distinctions. Two such properties will now be 
mentioned. 

Gini [19] recognizes two large classes of means: ‘‘A) medie ferme, B) medie 
lasche,”’ the latter (loose) class including the median and mode for which values 
do not depend upon all the data. To describe this latter mean m of arguments 
z;, we might write dm/dx; = 0 as applying to several if not most of the argu- 
ments over wide ranges instead of at isolated points. 

Subclasses of A or firm means as given by Gini will be discussed in Section 4. 

Another rather large classification distinguishes between simple means and 
their weighted forms. In a case often encountered, where the weights are 
whole numbers indicating frequencies of occurrence this distinction is of little 
significance. In the more general case, however, where weights may give ratings 
of the efficiency of measuring instruments or the weights may be negative [6, 
20], more direct attention needs to be paid the weighted forms. 

To supplement classifications already proposed, I am indicating in the next 
section a descent from the substitutive mean, the most general of all means, 
down through two classes of means less general, which I am calling the summa- 
tional mean and the quasi-arithmetic mean, to the more specific mean known 
as the associative mean, studied in particular by M. Nagumo, [21] A. Kolmogoroff, 
[22] and B. de Finetti, [2]. 

The foregoing subclasses of the general or substitutive mean are based 
primarily on structure, the way the mean is formed. 


3. The Summational Mean, Quasi-Arithmetic Mean, and Associative Mean. 
The summational mean, now to be defined, is a generalization of the weighted 
arithmetic mean. 


3] _ C121 + Cate + +++ + CnTn 
ae - AQteet---+e, ’ 


zc; ~ 0. 
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It is to be noted that although W is not a symmetric function of z;, W isa 
symmetric function of c;z;. In the generalization Q, the following features of 
W are retained: 

1. Certain weights c; being given, Q is a symmetric function of c¢,z; . 

2. This Q may be determined from sums of n terms, each term involving 
one and only one x; . 






DEFINITION. Let = denote a summation fori = 1, 2,---,n. Suppose that 
(3.2) F{y, Sfi(ciri ’ y), Lfo(ciri ’ y), re Zhe (cir; ’ y)} = 0 
has a solution, y = Q which is a substitutive mean of 41, 22, -+-,2%n. ThenQ 
will be called a summational mean of x1 , %2, --+, Ln, relative to the functions f,, 








fo, --- fx, and F. 
Sometimes it is possible to express Q as 


(3.3) Q = G{Zgilews), Zgo(cixi), --- , Zge(csr:)}. 


Among summational means, those of most frequent use involve in a special 


way but one summation. Thus with (x) a function, which would usually be 
taken as continuous, this m satisfies 


(3.4) 






¥(m)Xe; = Zew(z,). 













But this, with c; > 0, is just an algebraic analogue or prologue to the First 
Theorem of the Mean for integral calculus—the c; to be replaced by a positive 
integrable function. Without further specification, this mean m may have an 
uncountably infinite number of values. But if it be required that ¥(zx) be a 
continuous increasing function, and that c; > 0, then m is unique. 

In a series of papers, C. E. Bonferroni [20], [23-27] used means such as m in 
(3.4) for statistical and actuarial problems. And, as he had in mind [28] dis- 
tinctly the notion of substitution, he was in a sense a forerunner of Chisini. 
E. L. Dodd [29] made use of a mean m defined with the aid of n continuous in- 
creasing functions (x), thus: 


(3.5) 






Lew,(m) = rew:(2x;), c; > 0. 
If g(x) = cw,(x), this can be written 










(3.6) 2gi(m) = Y9;(z;). 


In one paper, C. E. Bonferroni [20], as already noted, used weights which 
might be either positive or negative. 

Some such mean as m in (3.4) has been used by a number of writers. Here 
¥(m) is a weighted arithmetic mean of ¥(z;); and thus it is natural to call ma 
quasi-arithmetic mean of z; . 

DeFIniTIon. Let 2c; ~ 0. If mis a solution of 


(3.4) 








¥(m)Zc; = ZewW(z,), 
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then m will be called a quasi-arithmetic mean of x; , with weights c; , and relative to 
the function p(x). 

Sufficient conditions for the existence of this mean m are: (1) That (x) be 
continuous in the interval I, finite or infinite, in which the observations 2; lie; 
(2) That either c; > 0 for each 7, or that ¥(x) take on all real values, as z runs 
through J. 

It will be helpful to picture geometrically the double transformation or mirror- 
ing represented by (3.4). Points x; on the horizontal axis are carried vertically 
to the curve y = y(x) and then reflected horizontally to the y axis. For the 
points y;, on the y axis thus obtained the arithmetic mean g or “center of 
gravity” is obtained. Then 7 is carried horizontally to the curve and reflected 
vertically to the z-axis. The abscissas m of points on the z-axis thus obtained 
are means of the given 2; , relative to this y(z). 

It may happen (Dodd [3 p. 746]) that the curve y = (zx) contains horizontal 
segments, as in the curve for temperature y of ice-water-steam which has ab- 
sorbed a quantity x of heat. In this case the mean m may be an “interval,” 
an uncountable set of real numbers. Indeterminateness over an interval is a 
well known feature of the median of an even number of variates. In fact, a paper 
of D. Jackson [30] was for the purpose of indicating one method of selecting a 
single value from this interval of indeterminateness, as a median. 

It may be noted that a mean of n variables becomes, when n = 1, a function 
of a single variable; and thus it appears possible to implant in a mean of n 
variables almost any peculiarity found in a function of one variable. 

A special case of the quasi-arithmetic mean is the associative mean m which 
under some general conditions has been shown [2, 21, 22] to satisfy 


(3.7) ny(m) = Zy¥(zi), 7=1,2,---,n; 


where ¥(x) is a continuous increasing function. 

If f.(t1, 22, ---, Xn) is an associative mean, then by definition, f,(x , 
Ze, +++, n) is unaltered when any k of the n variates are each replaced by the 
mean f, of that set. 


4. The Gini means as summational. Having distinguished firm means from 
loose means, Gini [19] noted that in the former class, a variate might appear as 
a base, as an exponent, or both as base and exponent. In general, these variates 
are to be positive. Gini then listed ten means of a decidedly broad character, 
some of them generalizing the combinatorial means treated by A. Durand [31] 
and O. Dunkel [32]. See also G. Pietra [37]. 

These ten means involve only the four simple arithmetic operations and root 
extraction. For many purposes they are best expressed in the form given by 
the author. However, to show that these means are summational, logarithms 
will be used to reduce products to sums. 
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S? = 
nlc 


Dx? 







n!/c!(n — c)!, a binomial coefficient; 


P. be any one of the ,C. products of c different elements taken from 
TZ, %,°°+ In; 


(P.)”, the p** power of P, ; 
Z. = =P-., the sum of all the ,C. products P, ; 
Z; = =P?. 





In the expressions which follow, it is assumed that the denominators are not 









zero. 

The ten means, as defined in Gini’s Equations I, II, --- , X, will bedesignated 
here by m, m2, --- , Mo ; and their logarithms, with base arbitrary, will now 
be given. 


log m, = (log S”? — log n)/p 
log mz. = (log Z. — log .C.)/c 






log ms = (log Z? — log »C.)/cp 
(log S” — log S*)/(p — q) 

rz? log z;/S? 

(log Z. — log Za — log ,C. + log ».Ca)/(c — d) 


log ms 


log ms 
(4.2) 





log ™m™ 
















log m; = (log Z? — log Z7 — log .C. + log »Ca)/(c — d)p 





log ms = (log Z? — log Zz)/c(p — 4) 
log my = =P? log P./cZ? 


log my = (log Z? — log Zj — log ,C. + log .Ca)/(cp — dq). 





As noted by the author, the foregoing include some well known special means. 
Thus, m, is the power mean, which for p = 1, 2, — 1, becomes respectively the 
arithmetic mean, the root mean square, and the harmonic mean. If p — 0, 
then the limit of m3 and of m; is the geometric mean. If p = 0, 1, 2, and q = 
p — 1, then m, is respectively the harmonic, the arithmetic, and the contra- 
harmonic mean. 

For each of the ten means, Gini gives an appropriate name. Those involving 
binomial coefficients are combinatorial, a mean like the contra-harmonic with 
denominator other than a constant is biplanar, the more simple means 
monoplanar. 


When in the following, I show that certain combinatorial expressions may be 
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replaced by sums, it is not implied that this replacement would simplify 
computation. 

To prove that m,, m2, ---, mo are all summational means, it may be noted 
that n, p, q, ¢, d, »C., and ,Ca are constants. Moreover, S” is the symmetric 
sum of the pth powers of x;, thus with only one x; in each term, and 
i=1,2,---,n. And, since Z., Z?, Za, and Z? are symmetric polynomials in 
the z;, they may be expressed as polynomials in S’, S’, --- , by a well known 
theorem of algebra. Hence among the ten means, the only one that requires 
special attention is the ninth mean, mp. 

To show that mz is a summational mean, we need only examine the numerator 
of the right member. Let this numerator be N. 


(4.3) N = =P? log P.. 

Then 

(4.4) QN = (xix? --- x2)(log zi + --- + log x?) +---. 
Thus, if we set y; = x], we may write 

(4.5) QN = (yiye--- ye)(log yi + --- + logy) +---. 


The coefficient of log y: in this right member is the sum of all products of c 
different factors which include 7: . 


Now, let Y, be the sum of the products of r different factors taken from 
Yi, Y2,-°-*, Yn; and let 7, be the sum of the products of r different factors 
taken from ye, ¥3,-:-,Yn- Then it is evident that 


(4.6) Y,=T7,+ mT; T, = Y, — wT. 
If, now, we set Yo = 1, it follows that 

(4.7) Tea = Yeu — wYeot yiYeos — ++» + (-1)° yi Yo. 
Hence, in @N, the coefficient of log y; is 

(4.8) yiT oa = WwYeu — yi¥e-r +--+» + (-1) "yi¥o. 
Thus in qN, the terms containing log y; are 

(4.9) Yay: log yr — Ye-ayi log ys + --- + (—I)yi log yw. 
Now let 

(4.10) U, = dyi log ys, 
Then, 

(4.11) Q@N = Y..U, — YeoUe + --- + (—1)" YoU... 


Thus, @gN is here constructed from sums of n terms with but a single y; in any 
term. 


Likewise, with y; replaced by 2; , a term contains but a single 7; . 
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5. Transformations. A function f(x; , r2, --- , %n) is not in general a mean 
of its arguments z;. However, it is often possible to make a substitution 
xz; = (yi) so that 


(5.1) Sie(y), o(Y2), eae (Yn)] = g(Y oBs*** » Yn); 
is a mean of its arguments y; . 


The required substitution is sometimes obvious, as in the case of the estimate 
s of scale 


(5.2) s = [(1/n)2(a; — m)*}? = [(1/n)Zyi]”. 


Here s is a mean of y;, although it is not a mean of z;. 

DEFINITION. Let y = (x), in general multiple valued, be defined in an in- 
terval I, finite or infinite, the values of y lying in an interval J. Suppose that for 
each y in J, there is at least one x in I such that ¥(x) = y. Let any such x be 


designated by o(y). Then o(y) will be called the inverse of (x). It follows that 
one value of 


(5.3) 





¥id(y)] = 


THEOREM. Let 
(5.4) 





z= f(t ,%2,°--, 


Zn), 


in general multiple valued, be defined when each x; ts in some ant I, finite or 
infinite. With xin I, set 


(5.5) v(x) ad f(z, ~™**- , x); 


and suppose that y = (x) has an inverse, x = o(y) defined in J. Let x; = 
(yi) be substituted into f to form the function 


(5.6) w = f[o(yr), o(y2), --- , O(Yn)] = gly, Yo, --+ 5 Yn)- 


Then w is a mean of y;, defined when y; is in J. It is thus a mean of (2x3), 
where x; is in I. 

If further, y(x) is a continuous increasing function of x, then for a given set of 
x; , the values of zand ware identical. The same is true for a given set of n values y; . 

Proor. If each y; = c, a number in J, then 


(5.7) flo(ur), «++, 6=)] = fld(e), »-» , 6()] = Vio). 


And one value of y¥[¢(c)] is c, from the definition of the inverse function $(y). 
Moreover, if a number c’ is taken in J, then y(c’) is some number in J, which 
we may call c; and the argument above is applicable. Finally, if ¥(x) is con- 
tinuous and increasing, then a number 2; in J is associated with one and only 
one y; in J; and vice versa. Thus w and z become identical. 

In the foregoing, we started with f which is not a mean of its arguments 2; , 
and obtained g which is a mean of y;. Something like the reverse of this is 
possible. The last member of (5.2) is a mean of y;. It was obtained by treat- 
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ing m as a constant, With respect to z;. If, however, m is an estimate for 
location and is taken as (1/n)Zz; , and this is substituted into (5.2) then 


(5.8) s = {[(n — 1)/n]Zaj — (2/n)za,x;}"”, i<j. 


This s is now not a mean of z;; for if z equal any constant c, then s = 0. 
Furthermore, there exists no single valued continuous increasing function z = 
¢(y) such that if z; = ¢(y;) is substituted into (5.8), s will be a mean of the 
y;. Thus the elimination of m from (5.2) interferes with the status of s as a 
mean of the 2; . 


6. Indeterminate Forms that arise in testing for Means. Sometimes a func- 
tion f is substantially continuous. But the investigation leading to the func- 
tion fails to assign to the function a value for certain values of the argument z, 
or arguments, 21, 2, ---,2%n. However, values are often assignable which 
will make the function continuous. This is the usual occurrence when, in curve 
fitting, parameters are estimated. In general, the measurements are assumed 
to be not all alike. However, when a general function such as 22;/n for loca- 
tion is obtained, we do not hesitate to assign to this function the value c when 
each 2; = c, to make the function continuous. 

As another illustration of “indeterminate forms,” consider the Jackson [30] 
median, M, of four numbers 2; S x2 < 23 S 4%, Viz., 


(6.1) M = (2x4%3 — 22t1)/(%4 + X3 — Le — 2X1). 


A direct substitution of z = c, renders M indeterminate. But if 2; — c, 
indeed, if merely xz — c, and x3 — c, so also does M. 

In a recent paper, R. Cisbani [33] generalizes means suggested by Dunkel 
[32] and L. Galvani [34] by setting up 


n —l/z 
(6.2) y;(x) = & Dd (a’ + any | , g#0, «#0; 
t=1 


and letting n — ©. There results an integral with the value 


in | Jue | 
(6.3) g(x) = lopanena| , 


for the case, x ~ j. This mean set up as a mean of an infinite number of variates 
turns out to be also a mean of the two numbers a and b,—which for b = a be- 
comes indeterminate. But as b approaches a, so also does 9;(x) approach a. 
This is also true for the special cases x = —j, ete. 

In testing to see if a function m of 2; is a mean of these numbers, a difficulty 
sometimes arises, because a substitution of z; = c and m = c into the equation 
which implicitly defines m will put zeros into denominators. An aid in such 
testing will now be formulated as a theorem, although the ideas involved are 
not essentially new. 








172 EDWARD L. DODD 


THEOREM. Let f(x) be a continuous increasing function of x defined for each 
real x. Let 


(6.4) 








f(0) = 0. 





Given n real distinct numbers 






(6.5) i <M <M «+s € Bee S Be, 








n positive numbers, k; , and a real number C. 
Set 













—- . ky Ke 
(6.6) P(z) = oar ee 
seit j@—2)* + Fe, — 2) 


- Then F(x) = 0 has n — 1 real roots m;, such that 


—C. 





(6.7) X41 << ™m < Ze < Me < -e + K< Ma-i < Xn} 


, 


also, a root less than x, , provided 
(6.8) 2ki/f(+2) <C;, 
or a root greater than x, , provided 
(6.9) rki/f(—*~) > C. 


Proor. Since f(x) is a continuous increasing function of z, so also is 
k;/f(x; — x), except for the single value, x = z;. So also, then, is F(x), except 
when z = 2, or 22 0r---or2,. But 


(6.10) 
























F(z; +0) = — ©; F(tu1 — 0) = + &. 


Hence, between zx; and z;4; , there exists a root m,, of F(x) = 0. 
Moreover, since 


(6.11) F(—«) = [2k,/f(+)] — C; F(x — 0) = +”; 


it follows that there is a root less than x2; , provided (6.8) is satisfied. Likewise, 
there is a root greater than 2, if (6.9) is satisfied. 

The use of this theorem in testing for means is simple. Keeping the 2; dis- 
tinct, the equation F(x) = 0 determines (n — 1) numbers, m;, such that if 
xr; — c, so also do these m; > c. Employing continuity to define m; when each 
2; = c, we may say that each m; is a mean of z; 37 = 1, 2,--- (n — 1);7= 
1, 2, --- n, when the conditions of this theorem are satisfied. If F(r) = 0 has 
still another root, m, this m will not in general be a mean of z; . 

















7. Summational Means arising in the Estimation of Parameters of Frequency 
Distributions. In curve fitting, the estimation of parameters leads in general 
to summational means. If the method of moments is used, the first step is to 
find the moments by summation. I have already considered estimates for 
jocation and scale by this method [7], and by the R. A. Fisher method of maxi- 
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mum likelihood [4]. A further study of the results of the likelihood method will 
now be made. 

By this method, products which first appear are reduced to sums by log- 
arithms, and the means found are, in general, summational. Some idea of the 
forms of these means can be obtained by examining a rather general form of 
frequency function which includes the Pearson Type I, and involves parameters 
with estimates p >' 0 and gq > 0, in addition to the location m and scale a. 
Let the observations be 2; , t2, --- , Zn; let 


(7.1) t; = (x; — m)/a; 0st<l; a>Q0O; 


. 1T(p+q) »- - 
2 ees hl Fe 
- Y= Gr@re) . 
The likelihood L is obtained by multiplying together the n factors obtained 
by substituting ¢ = th, fe, ---, th. 
Then 


log L = —nloga + n log I'(p + gq) — n log T(p) — n log I'(q) 
(7.3) n n 
+ (p — 1) 2) log t; + (gq — 1) log (1 — 4). 


From 0L/dm = 0, there is obtained 


1 1 
ane TP gSe5a"* Vers weg~s 


(7.4) Pz 


Suppose P ~ 0 and Q ¥ 0; and as a first case, suppose P + Q ~ 0. If each 
z; is replaced by zx, the above equation leads to m = x — (Pa)/(P + Q). 
Then m is a summational mean of 


(7.5) ai = 2; — (Pa)/(P + Q) 1=1,2,---,n; 


as seen by applying the Theorem in Section 5. 
Likewise, a is a summational mean of 


(7.6) xi = (x; — m)(P + Q)/P. 
If P = 0, Q # 0; but P + Q = 0, then (7.4) becomes 


(7.7) = cei z eile, 
%—-m—a %— mM 


Now set y; = 2; — m,C = 21/y; ; and write (7.7) as 


(7.8) re) = 2 -¢c=$6 
¥—a@ 
This has the form given in (6.6) with z replaced by a, k; = 1, f(a) =a. If then 
Wu < Yo < +--+ < yn, there exist (n — 1) solutions a; of F(a) = 0 between y: 
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and y,. And thus keéping the y; distinct, if y; — ¢, so also do the a; + @ 
These a; are then means of y; , and thus, means of x; — m. 





In the more general case where P + Q + 0, it is seen also that Q is a summa- : 
tional mean of t 
(7.9) P| —_— i]. 

a 7% 

From dL/da = 0, quite analogous results are obtained. The special case s 
now, however, is given by P + Q+1=02=p+q-— 1. And, with the 
continuity interpretation, @ is a mean of x; — m; and moreover, m is a mean of ] 
Xi — a. f 

Using now the digamma function 

d 
(7.10) F(u) = — log T'(u), 
du 
set 
; 


(7.11) D(p) = F(p + q) — F(p). 
The condition dL /ap = 0, then leads to 


(7.12) 









D(p) = (1/n)z(—log ¢:), 0<#421. 









Now, with g > 0, D(«) = 0, D(— 1 + 0) = ~; and D(p) is a continuous de- 
creasing function of p, when p > — 1. Then, since — log ¢; > 0, there isa 
unique p > — 1 to satisfy (6.12). 
To be useful, here, p should be > 0. But, at all events, the p thus found is 
a mean of D-'(—log t;), where D™ is inverse to D. 
The digamma function (7.10) appears also in estimating the parameters for 
the Pearson Type III. 
1 1 


(7.13) y — a '(p +h er, t = (a —_ m)/a, Pp > —1. 

















By setting aL/ap = 0, it is found that m is the arithmetic mean of x; — ae‘?*”; 


a is the arithmetic mean of (4; — m)e os while p is a summational mean of 
¢ {log (x; — m)/a} — 1, where ¢”° is the inverse of ¢. From aL/am = 0, it 
is found that m is a summational mean of z; — pa; a is the harmonic mean of 
(x; — m)/p;and pis the harmonic mean of (1; — m)/a. Finally, from dL/da = 
0, there is obtained 












(7.14) (1/n)Zz; = m + a(p + 1), 





which makes m, a and p each an arithmetic mean of a simple function of the 
observations z;, when the other two estimates are taken as constants. 
Comparison of (5.2) with (5.8) has shown that after complete elimination, 
estimates may cease to be means. However, it may be noted that s is more 
frequently exhibited in the form (5.2) where it is a mean than in the form (5.8) 
where it is not 
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8. Generalizations. The extension of results from the discrete or discontinu- 
ous case Where a mean m depends upon only a finite number of elements to the 
continuous case is fairly immediate, with integration taking the place of summa- 
tion, and a distribution or frequency function taking the place of discrete weights, 
c;. Stieltjes and Lebesque integrals may be used as well as Riemannian. Such 
a generalization of the Chisini mean was given by de Finetti [2]. 

The summational mean, which I have defined as involving possibly several 
summations, may be generalized likewise. 

In terms of set functions, sometimes called functionelles, I gave [35] the fol- 
lowing general definition of a mean with a point set H in mind as a distribution 
function. 

DEFINITION. Let E and H be sets of numbers. Such a number t may be a real 
number or a vector number t = (ti, te, --+, tk). 

Let E, be the result of replacing each number of E by a single number t. 

Then the mean m of numbers in E, relative to the set H, and to a function f, is 
given by m = f(E, H); provided that the function f has been so constructed that 
for each tin E, f(E., H) = t, or at least one value of this fist. It is to be under- 
stood above that when E is changed to E, , the set H remains unaltered. 

This retains the chief feature of f(t, t, --- , t) = t in explicit form or of f(t, 
t,--- ,t) = f(t, te, --- , t,) in implicit form, where tis a mean of t) , fg, --- , tn. 

I used [36] a somewhat less general definition to discuss regression coefficients. 
All such means may well be called substitutive or representative. 
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THE PRODUCT SEMI-INVARIANTS OF THE MEAN AND A 
CENTRAL MOMENT IN SAMPLES 


By Cecit C. Craic 


The method developed by the author for cajculating the semi-invariants and 
product semi-invariants of moments in samples from any infinite population’ 
is not immediately applicable to the calculation of product semi-invariants of 
the mean and a central moment in such samples. In the present paper this 
method is adapted for this purpose so that the calculation of these product 
semi-invariants becomes routine. As it will be seen, the computing is a little 
heavier than in the case of central moments alone for results of equal weight. 
A table of results up to weight ten for the mean and the second, third and fourth 
central moments is given. ‘The author plans to apply these to a further study 
of the sampling characteristics of the coefficient of variation and Fisher’s ¢ in 
samples from non-normal populations. 

Let a random sample, 21 , X22, --- , Z» of N observations be drawn at random 
from an infinite population characterized by the semi-invariants, \; , Az, As, --- 
The sample mean is, 


== > x/N, 


t=1 


and the n-th central moment of the sample is 
N 
ma = >, (a — &)"/N. 
t=1 


Then the product semi-invariants of order kl of x and m, , Sxi(z, m,), are defined 
by the formal identity in the parameters 3 and w: 


(Sid + Sow) + 5 (Sd 4- Seaes)™ 
(1) 
+ 7 (S08 + Sow) +... = log E(e*”*™*), 


in which E denotes the mathematical expectation over the set of all such 
samples and 


(Siw? + Snw)” = 2d ‘) Sj1-+(4, mn) Pw" ?. 
ta 


1 


An Application of Thiele’s Semi-invariants to the Sampling Problem;’’ Metron, Vol. 
VII, part IV (1928), pp. 3-75. 
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a ae 2. 5 
If we denote E(&'m,,) by Mx: , we have by definition the further formal identity 
in J and w: 


E(e 204 mn) = = 1 + (M od Sa Mw) +5 (M od ob Mow)” + 


in which (My ++ Mow)’ is to be expanded in the same manner as 
y ¢ uJ ( 
(Siok + Sow) ” above. 
Let us write 
6; = 2% — Z, 
and then 
(2) E(e??*™*) aa Big re eee 
(Summations with respect to 7 and j always run from 1 to N.) Now we define 
a new set of product semi-invariants, A,s:..., of the sum Lx; and the N 6,’s, by 
means of 


(Aro? + Droiwi) +z 5 wd + Trove) + «-- a log B(e(270 Ot 2448) 


in which for example, 


t=1 


3 (2) 
(rw + 2. de = Aro0F + 210091 


+ 21010 Jwe + “ei + o200 wt + oo20 wa + Aooo2w3. 


We may 
with 


Then 
E (ee PONs) = Be OTP ICM) = Ele). E(e*") ... Elen"), 
in which 
=d+ 2d Ajj W;- 
It follows then that 


: : , 1 i ie Sa 
(Aid + TAroivwi) + = ioed + Bia” 


aa es 


1 
+ = (Awd + Dru)? + --- = ATa: + re — 


3! ,? rs 3 


4+. 
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from which 


(Awd + Trniw)“*” = Ak41 ZL (8 + = ton”. 
2 2 


From this 
Ax00---0 => Axo => Nz , 
Ano-..0 = Axono...0 = ++ 


2 
and generally, 


(3) ee ‘H [s(—) (NW = 14] ih ial nn. eee aaa 
This is the first result to be used in calculating values of S,;,’s. Note that the 
value of Agi,7,....v is independent of the order in which a given set of 1,’s occur. 

Calculation of particular A,2,7,...,y’s in terms of N and the semi-invariants of 
the sampled population is both simple and rapid as one may see from a pair 
of examples: 


doo = Ane = Azw2 = --- 
(suppressing superfluous zeros in the subscripts) 


_ A 


N-1 
~~ sole 


(N -—1)?+(N—-1)] = WT OM: 


Then, too, 


KR — 
hie = ——— Aas. 


N 
For a second example: 
Dn 
devs = Gy [-(N — I) + (NW - 1) - (W - 2)] 


_ _(N — 2)(N’ —3N + 3) 
= -“—= i ) 





NK47 . 


Now the semi-invariants, S,;;, can be expressed directly in terms of the 
product moments, v«1,1,...1y Of the sum Zk; and the Né’s. These product mo- 
ments are given by the appropriate moment generating function: 


Er;)94+28.0 1 
E(eP 200 F828) = 1 + (v9 + Lvowi) + 5 (08 + Evo)? + +++. 


? As written this result is valid if at least one of the l;’s is zero which is always the 
case if N, the size of the sample, is greater than 1. (Cf. the author’s paper cited above, 
p. 17.) 
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Then it is seen that, 
E (oP EM) = 1 + rw) + (Zromiduh + 3; [rod + (Zroai)a]® + ++. 

in which 

[vio® + (Zvo,ni)o]” 

_ Yooh + 2(Yin + vion + Vion + --- )dw + (Y0,2n + vo0,2n + Vo00,2n + --- Jo” 


etc. and by comparison with (1) and (2), we have 


(Siod + Sow) + z (Sid + Sorw) +... 


= log f + i [vi0 8 + (Zvo, nie] + en [vind + (v0, nid] +.. }. 





2!N? 
From this 
(Sid + Sorw)**” 
4) 1 5s (=D? = WIK+D ford + (Zro,nsJeo]” {[r109 + (Zro,nio] }"--- 
Ne (1)r(2!)*--- ris! --- 
in which 


r+sti+.---=p, 


the summation extending over all partitions (1'2°3'.--) of k + 1. This, of 

course, is only the usual formula for semi-invariants in terms of moments appro- 

priately modified. In particular, 
1 


(Sie? + Snw)? = we {[vi0® + (Zvo,ns)o]” — [riod + (Zvo0,ni)w]°}. 





If we write 


[vi0F + (Zro,ns)w] = W 


(5) (Siod + Sow) — x (w® — 3W? Ww a ow’) 


(Sp8 + Sw)” = wi (w® —4w°w — 3(W)? + 12W? Ww? — ow’). 


Now the »2,1,...1y 8 can be replaced by their values in terms of the Axz,1,...1y’8, 
the details of which will be explained below, and it will be evident that any 
Velylg---ly 18 unaltered by a permutation of the l,’s in its subscript. Taking 
account of this, the formulae (5) may be written in the expanded forms: 


Si(Z, m,) = + (i — Von] 
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. 1 2 
S12(Z, Mn) = = [von — vovon — 2invio + 2v10V0n] 
N2 


1 
S1o(Z, Mn) ae [Vion + (N vs 1)vinn — V10V0,2n (N oe 1)v10V0nn 
N? 


— 2Nrinvon + 2 rr0v%n. 


But, with no loss in generality, the origin may be taken at the population mean 
so that A: = 0. In this case it will be found that 14> = 0 and these formulae 
become: 


Su(Z, ma) = vin/N 


1 
Sa (Zz, Mn) = 2 [ven — veovon] 
N? 


Y = 1 € 
S12(Z, Mn) = N? [v1.27 + (N = 1)Yinn ee 2N vrinYonl 


1 


Sa(Z, Mn) = N2 [Yan — Va0V0n — 3Vinv20] 


Y = 1 T 
Soo(z, Mn) =. a [veon + (N = 1) Penn — 2Nvenvon — V20V0,2n 
N3 
— (N mae 1) v20¥0nn = QNvin + QNvrovin] 


1 . . 
S13(Z, Mn) — N3 [Yi,3n + 3(N aca 1)ri,2n,n + (N eed 1)(N as 2)?iane 
ae 3N14,2nV0n = 3N(N —_ 1)¥innYon ~ 3N M1nV0,2n 
= 3N(N = L)YinVonn aa 6Nvinvon|- 


These formulae are the second result used in the actual calculation of 
Sii(Z, mn)’s. One begins with them, putting in the particular value of n for 
the central moment in question. If for instance we wish to compute the product 
semi-invariants of the mean and variance in samples of NV, we begin with the 
set of formulae: 


Sii(Z, me) = vie/N 
1 


Sai(Z, me) = N2 [vee — v29vo2] 
d 


1 
Sio(Z, M2) = nN? [vig + (N — 1)ri22 — 2Nrr2V0], 


etc. 

The second step is to replace the product moments y1,1,...1y Which appear by 
their values in terms of the corresponding product semi-invariants. This process 
can perhaps be best explained by some examples. 
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Consider the complete calculation of Si2(%, me). 
fifth central moment in terms of semi-invariants: 


From the expression for the 





vs = As + 10AaA2, 


we can write the corresponding expression for product moments in terms of 
product semi-invariants 


(8) v1) = (ZAG) + 10(ZA0,)° (ZV). 


4 
rn . = U1 de ° 
Then we get v4 by comparing coefficients of itd! and vj2, by comparing coeffi- 
2 g2 saat 
. 0,05 3 . a ° 4 . . 
cients o “o1er 2 this identity. For an index as low as 5, these coefficients 


are readily picked out by inspection; for larger indices the use of Hammond 
° . ° 3 ° 
operators reduces this to a mechanical routine.” In this case we have 


D;D2(14) = (12)(02) + (03)(11). 


To the terms on the right the appropriate binomial coefficients must be applied 
giving 
3(12)(02) + 2(03)(11). 





5! 
AN!’ 
plying these coefficients by 10/5, we have 


The total of these coefficients is 5 = a necessary check. Then multi- 


6A12A02 + 4NosA11 












for the required coefficients in the second term in (8). Thus 
vis = Ara + (GAr2Av2 + 4Ao3A11). 


The two terms in parentheses arise from the same term in (8) and would both 
give rise to terms in A3A¢ in the final result if Ay; were not identically zero from (8). 
In practice all terms in which A, is a factor are crossed out as they appear. 
Next — 


D3D2(122) = 2(12)(02) + (111)(011) + 2(021)(11). 


(Aoo2 = Aoz } Ao12 = Aoa-) With the binomial, or multinomial coefficients attached, 
the right member is rewritten 


6(12)(02) + 12(111)(011) + 12(021)(11). 






*Cf. the author, loc. cit., p. 24. 
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5! 


The total of these coefficients is 30 = tent 


Then multiplying each coeffi- 
cient by 10/30, we have 
vig = Arve + (2QAr2A02 + 4ArA01 + 4Ao12A11). 
Going on with the calculation of Sj2(%, mz): 


| iba Ar ; in = Noe ; 


and then we have: 


1 : 
Si2(F, M2) = 2 Lis + (N — 1)Aiz2} 


+ { GAr2A02 + (N - 1)(2Ari2A02 + 4ArA01u) — 2NAr2drv2} J. 


The first set of terms within braces gives rise to terms in \s5; the second to terms 
in AzsAe. Next 


T ) T 
thes sia (N caret LQ 3N + 3) ‘ _As 


N3 Aun = 


_ aN -3 
ad N3 


(N — 1)(N -2), 
N?2 3 


_(N — 2) ha 
N? 7 


As doo 


do21 


This table of values will be of frequent use in further calculations of S,.’s. 
Giving the values of both Ai and Agu here, was unnecessary duplication. 
Now only the final reduction is to be carried out. We obtain 


N-1 


Ni [(N — 1)As + 4NA3 Ad]. 


Sx2(Z, me) = 


This result of order 3 and of weight 5 follows a quite mechanical procedure 
and is quite brief. The length of the algebraic computations required grows 
rapidly as the weight is increased but for weights no greater than 10 undue labor 
is not required. For greater weights only time and patience is required to get 
results if they are needed. It is to be noted that by this method one may 
calculate individual terms in the result without doing any of the work required 
for the remaining terms and that one may readily shorten the work by getting 
results to a desired degree of approximation with respect to powers of 1/N. 

There follows a table of the results so far calculated. 











CECIL C. CRAIG 


Sie = N - 1 [((N —_ 1)As +- 4N)Xs3)o] 


N-1 
Sa = As 


i a Ti(N — Ide + 4N ude + 23)] 











en . a TWN — 1)?\y + 12N(N — 1)dgdo + 4N (SN — 7) dads + 24N? Aad]. 


It is not difficult to see that in general 


1 
Sia(¥, me) = Age - 


N k+1 










For 2 = 3: 





= (N — 1)(V - 2), 
NS 4 

— N — yA 
Sa = a ) Xs 





_ (N—1)(N — 2) 
Né 


4+ Q7N(N — 2)dgda + ISN7A3Az] 


_ (N — 1)(N — 2) 
Né 


o— a — ) 1(N — 1)(N — 2)he + N(N — 2)Qods 


4+ 36N(N — 2)dsdx + 27N(N — 2)A2 + ISN? AGAS + BGN7AZAgI 


(N — 1)(N — 2) 
- N10 


+ 9(N — 1)(3N* — 12N* + 12N° — 5N + 5)dgdo 
+ Q7N(4N* — 21N* + 36N° — 20N + 3)ArAz 
4+ Q7N7(N — 2)°(7N — 11)Ncde + 54N°(N — 2)(4N — 7)doAo 


[(N — 1)(N — 2)A7 + ON(N — 2)A5r2 











Ac 






Sis = [N(N — 1)°(N = a)" he 
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+ 27N°(N — 2)°(4N — 7)d§ + 54N°(N — 2)(23N — 50)AsAaA2 
+ 162N°(N — 2)(5N — 12)did2 + 54N°(29N* — 126N + 140)\4A3 
+ 108N*(5N — 12)dsA2 + 324N*(5N — 12)d35Ag]. 

For n = 4: 


iis — [(N? — 3N + 3)d5 + 6N(N — 1)AsA2l 


a = [(N? — 3N + 3)de + 6N(N — 1)(ade + 23)I 


sy = NS ew — yet - an +3) 


+ 4N(N’ — 3N + 3)(7N* — 18N + 15)drr2 

+ 4N(N* — 3N + 3)(19N’ — 66N + 63)AcAs 

+ 4N(29N* — 195N* + 537N* — 639N + 351)AsA; 

+ 12N°(17N* — 71N’ + 117N — 69)AsA3 

+ 24N*(35N* — 173N* + 309N — 189)dsAsAz 

+ 72N°(N — 2)°(3N — 5)d3 + 96N*(4N* — ON + 6)AsAz] 


Sn = “— 1 [(w* — 3N + 3)k. + GN(N — 1)¥sd2 + 18N(N — 1)Aadsl 


a 1 ICN — 1)(N? — 8N +8)?Av 
+ 4N(N* — 3N + 8)(7N* — 18N + 15)dsrz 
+ 8N(N* — 3N + 3)(13N* — 42N + 39)drAs3 
+ 12N(16N* — 106N* + 285N* — 360N + 180)d6As 
+ 12N°(17N*® — 71N* + 117N. — 69)dcA2 
+ 4N(29N* — 195N* + 537N” — 693N + 351)d5 
+ 48N°(26N*® — 125N* + 213N — 129)dsA3Az 
+ 24N?(35N® — 173N* + 309N — 189)AjA2 
+ 24N?(62N* — 326N* + 597N — 369)d4A3 
+ 96N*(4N? — 9N + 6)Agd2 + 288N°(4N* — ON + 6)A5AII. 
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unpublished paper, he proved among other things that the A test of the genera] 
linear hypothesis is the most powerful of all those, the power function of which 
depends on the same argument as that of the A test and not on other parameters, 
The above circumstances suggest the following problem: to see whether it is 
possible to devise a test of ‘“Student’s’” hypothesis such that its power function 
would be independent of o. If such a test could be devised and proved to be 
reasonably powerful then the tables of its power function could be used for the 
purpose of planning experiments. 

The purpose of the present paper is to show that no such test exists and, 
consequently, this negative result implies in still another way that it is im- 
possible to improve on the test originally suggested by ‘‘Student.”’ 












2. Statement of the Problem. The problem of finding a test whose power 
function is independent of o is equivalent to finding a critical region w such 
that the value of the power function 


(4) 


for any fixed £ is independent of the value of ¢, where FE denotes the sample 
point (171, 22, --- Zn). Weshall show specifically that if this is the case, then 
the power function is also independent of &; so that the test will reject the hy- 
pothesis tested with the same frequency independently of whether it be correct 
or wrong. 


B(é, oc) = PiEew| &, o} 














3. THEOREM. [If there exists a region w such that, whatever be 


( 1 ; i - [ em = (:-E0)? I> dre cwacale 
Sine Sy in 


1 . 1 x ( £,)2 
( ,) | oe | @ 202 5m ewe dz dre eee dzXn = B, 
T 


where & * &, a, B are constants, then 


the value of a, 


(5) 


ll 
Rg 


(6) 








(7) a = £. 










A region w is called similar [1] to the whole sample space, W, of size a, with 

respect to a set of elementary-probability laws p(F | @) given in terms of a 

parameter 6, if P{E ew! 6} = a, whatever be the value of 6. Essentially, 

then, the region, w, above is a similar region with respect to two different sets 

of elementary laws each being given parametrically in terms of the parameter ¢. 
n 

Denote by w, the portion of the surface of the hypersphere, dX (2; -— &) =r 
= 

which is common to w, and let the total surface be denoted by W,. Neyman 

and Pearson have shown [1], that a necessary and sufficient condition that w 

be a similar region, in the above case, is that, whatever be r, the probability 


’ 











tk 
th 


(f 
fc 
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that the sample point EF will fall on the subsurface w, , when it is known that 
the sample point lies on the surface W, is a, i.e. 

(8) P{E ew, | (Ee W,)( = &)} = a 


for all r. 
In a similar manner let w, denote the portion of the surface of the hyper- 


sphere 2 (x; — &)° = p’ common to w, and let the total surface be denoted 
t=1 


by W,. Since w is similar to the set of probability laws indicated in (6), we 
have also 


(9) P{Eew,|(EeW,)E = &)} = 8 


for all p. 
Since on the surface W,, the elementary probability law, 


i ~~ \V2re , 


is constant, we see that an equivalent statement of (8) is that the hyper-area of 
w, is a constant proportion, a, of the total hyper-area W,. Similarly, from (9), 
we have that the hyper-area of w, is a constant proportion, B, of the area of the 
hypersurface W, , whatever be the values of r and p. 

Consider the transformation which expresses 21 , 2, --- Xn in terms of gen- 
eralized polar coordinates with pole at the point (&, &,--- , &), ie. 


X1 — & = 7 COS 82 COS 63 --+ COS An_2 COS On_1 COS Oy 
= 7 COS 02 COS 63 --+ COS On_2 COS On_1 Sin 8, 


= T COS 02 COS 03 --+ COS On—-2 SIN On-1 


r COS 62 sin 63 
T sin 6. 


Let A be the Jacobian of the transformation: 


(12) |A| = r""| [I cos* @n42-+| = r” '7@,). 
t=2 


Consider also a transformation which expresses (x1 , 42, --- Xn) in terms of polar 
coordinates, the point (&, &,---, &) being pole. It may be obtained by 
replacing in (11), && by &, 7 by p, and 6; by 6;. The Jacobian of this trans- 
formation is given by |A| = p” *T(6,). 

We are now able to express the hyper-area of W, : 


(13) [/ | A | d0_.d0; ---d0, =r" [J T (0;) d@,d0; --- d0, = Kr", 


Wr Wr 
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Then it is seen that, 


Ee PF 2%) = 1 + [riod + (Zro,niol + 5 [Pwd + (Zvo,ns)o]” + -. 
in which 
[ro + (Zvo,ns)o]” 

= va” + 2(vin + Vion + viv. +--+ wo + (Vo2n + Yo0,2n + vooo2n + +++ )a® 


’ 


etc. and by comparison with (1) and (2), we have 
(Siod + Sorw) + Ss (Sid + Sacw)™ +... 


ite 


= log f + i ld + Crna + ory 


[vi0? + (Zvo,ns)o]” + -. +. 
















From this 
(Sid + Sorw)**? 


8) 1 5 ($1) *@-DNE + D Mood + Zr. {lod + (Zro.nda] Je 
— (i)r@l).-- rls! -.- 


in which 
r+s+it+..-. = PD, 


the summation extending over all partitions (1'2°3'.-- ) of k + 1. This, of 
course, is only the usual formula for semi-invariants in terms of moments appro- 
priately modified. In particular, 


(Sid + Sriw)” = rf {[vi0 + (Zvo,ns)0]” — [riod + (Zvo0,ni)w]°}. 


If we write 





[y0F + (Zro,n)v] = W 


(6) (Sid + Sow)” = x (w® — 3W?wW + ow’) 





(Sw? + Sow)” = z (Ww —4wW®? WwW —3(W)? + 12W W? — ow]. 
N4 





Now the »1,1,...14y'8 can be replaced by their values in terms of the Ax, 1,...1y'8, 
the details of which will be explained below, and it will be evident that any 
Velyl---ly 18 unaltered by a permutation of the J,’s in its subscript. Taking 
account of this, the formulae (5) may be written in the expanded forms: 


Si(Z, m,) = yi [vin — v10¥0n] 
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‘ 1 2 
S12(Z, Mn) = N? [ven — vovon — 2invio + 2v10%0n] 


1 
Si2(Z, mn) = N? [vien + (N — 1)vinn — Vi0r0,2n — (N — 1)r10%0nn 


— 2Noinvon + ZN vrr0v0n]. 


But, with no loss in generality, the origin may be taken at the population mean 
so that A, = 0. In this case it will be found that v4 = 0 and these formulae 
become: 


Su(Z, Mn) => Vin/N 


" 1 
Sa (Z, Mn) = N? [ven = vooVon] 


1 
Si2(Z, Mn) = N? [Vi,2n + (N o 1) Minn ~—. 2ZNrinvonl 


i 1 
Sai(z, Mn) _ N [vsn — V30Von — 3VinV20] 


Soo(z, Mn) =_ 3 [von + (N a 1) Venn - 2Nvenvon — V20V0,2n 
N3 
— (N — 1)ve%nn — 2Nvin + Zeon] 


1 
S13(, Mn) = N3 [v1,3n + 3(N s, 1)ri,2n,n + (N 7 1)(N iia 2)vinnn 
= 3N11,2nV0n -— 3N(N _ 1)v1nnYon = 3NM1n¥0,2n 
— 3N(N — 1)rinvonn + 6N*vinvinl- 


These formulae are the second result used in the actual calculation of 
Sxi(Z, mn)’s. One begins with them, putting in the particular value of n for 
the central moment in question. If for instance we wish to compute the product 
semi-invariants of the mean and variance in samples of N, we begin with the 
set of formulae: 


Si1(Z, me) = v2/N 


1 
Seoi(%, m2) = n2 [v2e2 — ve0voe] 


1 
Sio(Z, M2) = NE [vis + (N — 1)r122 — ZN r12V%0], 


The second step is to replace the product moments v¢1,1,...1y Which appear by 
their values in terms of the corresponding product semi-invariants. This process 
can perhaps be best explained by some examples. 
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Consider the complete calculation of Si2(%, me). 
fifth central moment in terms of semi-invariants: 


From the expression for the 





Vs = As + 10Azr2 , 


we can write the corresponding expression for product moments in terms of 
product semi-invariants 








(8) (29) = (2AV)™ + 10(ZA0,;)° (ZA0,). 









a , , 3102 
Then we get vu by comparing coefficients of aa 


. 90203... .y .. ; 
cients of “o1or 2 this identity. For an index as low as 5, these coefficients 


and v2, by comparing coeffi- 











are readily picked out by inspection; for larger indices the use of Hammond 
a ‘ ‘ 3 . 
operators reduces this to a mechanical routine. In this case we have 


D3D2(14) = (12)(02) + (03)(11). 
To the terms on the right the appropriate binomial coefficients must be applied 
giving 
3(12)(02) + 2(03)(11). 






! 
4!1!’ 
plying these coefficients by 10/5, we have 


The total of these coefficients is 5 = a necessary check. Then multi- 


6A12A02 + 4dosA11 












for the required coefficients in the second term in (8). Thus 


via = Ara + (GAr2vc2 + 4Ao3A11). 





The two terms in parentheses arise from the same term in (8) and would both 
give rise to terms in A3A2 in the final result if \,; were not identically zero from (8). 
In practice all terms in which Ajj is a factor are crossed out as they appear. 
Next — 


D3D2(122) = 2(12)(02) + (111)(011) + 2(021)(11). 


(Nooz = Aoz ; Ao12 = Aon.) With the binomial, or multinomial coefficients attached, 
the right member is rewritten 


6(12)(02) + 12(111)(011) + 12(021)(11). 


*Cf. the author, loc. cit., p. 24. 
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5! 


The total of these coefficients is 30 = a9! 1h 


Then multiplying each coeffi- 
cient by 10/30, we have 
Vig = Ariza + (2ArAc2 + 4AiA01 + 4Ao12A11). 
Going on with the calculation of Si2(%, me): 


vie = Ar, von = oz , 


and then we have: 
1 
Sio(Z, m2) = N2 [fra + (NV — 1)Ar22} 


+ { 6A12A02 + (N oe 1) (2Ar2d02 + 4111011) - 2N Ar2dAv2} J. 


The first set of terms within braces gives rise to terms in \s5; the second to terms 
in AsAg. Next 
(N _ 1)(N? os 3N + 3) ‘ — As 
Na - N 
2N -—3 N-1 
aie a a 
ne . vn ™ 


(N — 1)(N-2), _ od 
N?2 3 — N° 


Aus = 


Ai2 = 
os = 


dor = 


This table of values will be of frequent use in further calculations of S,.’s. 
Giving the values of both Ai and Aon here, was unnecessary duplication. 
Now only the final reduction is to be carried out. We obtain 


Si2(Z, m2) = - 


i icy — 1)ds + 4N da Aa. 


This result of order 3 and of weight 5 follows a quite mechanical procedure 
and is quite brief. The length of the algebraic computations required grows 
rapidly as the weight is increased but for weights no greater than 10 undue labor 
is not required. For greater weights only time and patience is required to get 
results if they are needed. It is to be noted that by this method one may 
calculate individual terms in the result without doing any of the work required 
for the remaining terms and that one may readily shorten the work by getting 
results to a desired degree of approximation with respect to powers of 1/N. 

There follows a table of the results so far calculated. 
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For n = 2: 


Su = 


Sa = 


N-1 
Si = Ne [((N = 1)A5 + 4NXs3Aal 


N-1 
Sa = a As 


Soe = . : [((N = 1)Ag + AN (AgA2 + d3)] 


. - a LON — 1)?dy + 12N(N — 1)dgho + 4N(5N — 7)Agds + 24N7 A923]. 






It is not difficult to see that in general 


N-1 
SilZ, m2) = Neti An+e2 - 


For n=3: 
ane ae -2),, 
Sa = (N = — — 2) vs 












(N — 1)(N — 2) 
Né 


+ Q7N(N — 2)d\gda + I8N7AgAz] 


_ (N — 1)(N — 2) 
N& 


_ W- OW — 3) 
N? 


+ 36N(N — 2)dsd3 + 27N(N — 2)02 + ISN7AGAZ + 3GN7A2 Ao] 


(N —1)(N—-2 
Sis = an ~ N10 


+ 9(N — 1)(3N‘ — 12N* + 12N” — 5N + 5)Asro 
+ 27N(4N* — 21N* + 36N’ — 20N + 3)ArAz 
+ 27N?(N — 2)°(7N — 11)dXcda + 54N°(N — 2)(4N — 7)doAS 


Sie [((N — 1)(N — 2)A7 + ON(N — 2)AsAz2 






Ag 












[(N — 1)(N — 2)ds + ON(N — 2)ror2 


) IN(N — 1)°(N — 2)?Awo 
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+ 27N*(N — 2)°(4N — 7)A3 + 54N°(N — 2)(23N — 50)AsAsAz 
+ 162N°(N — 2)(5N — 12)Azd2 + 54N°(29N* — 126N + 140)d4d3 
+ 108N*(5N — 12)dA2 + 324N*(5N — 12)d5Az]. 

For n = 4: 


Su = “= [(N* — 8N + 3)ds + BN(N — 1)Asdd 


Sa = a [(N? — 3N + 3)dX5 + 6N(N — 1)(\ade + A})] 


Su = “4 [ww — vt — 8N + 3)" 


+ 4N(N’ — 3N + 3)(7N’ — 18N + 15)ArA2 

+ 4N(N’ — 3N + 8)(19N’ — 66N + 63)dcAs 

+ 4N(29N* — 195N* + 537N* — 639N + 351)AsA4 

+ 12N°(17N* — 71N* + 117N — 69)dsA2 

+ 24N°(35N* — 173N* + 309N — 189)dsAsrz 

+ 72N*(N — 2)°(3N — 5)d3 + 96N°(4N? — ON + 6)Asd2] 


ai a [((N? — 3N + 3)d7 + 6N(N — 1)Asd2 + 18N(N — 1)radsl 


Su = “4 [(W - 1)(Nt — 8N + 3)*Xv 


+ 4N(N* — 3N + 3)(7N’ — 18N + 15)Asd2 

+ 8N(N* — 3N + 3)(13N* — 42N + 39)drAsz 

+ 12N(16N* — 106N* + 285N* — 360N + 180)oAq 

+ 12N°(17N*® — 71N’ + 117N. — 69)d6A3 

+ 4N(29N* — 195N* + 537N* — 693N + 351)A; 

+ 48N*(26N* — 125N* + 213N — 129)dsAsAz 

+ 24N?(35N* — 173N* + 309N — 189)A3)o 

+ 24N°(62N* — 326N* + 597N — 369)AsA3 

+ 96N*(4N? — ON + 6)dAsd2 + 288N°(4N° — ON + 6)A5QI]. 
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ON THE NON-EXISTENCE OF TESTS OF “STUDENT’S” HYPOTHESIS 
HAVING POWER FUNCTIONS INDEPENDENT OF oa 


By GeEorGE B. DaAntTzIG 


1. Introduction. Consider a system of n random variables 2, 22, -++,2, 
where each is known to be normally distributed about the same but unknown 
mean, £, and with the same, but also unknown standard deviation o. The 
assumption, Hy , that — has some specified value, & , e.g. & = 0, while nothing 
is assumed about ¢c, is known as the “Student”? Hypothesis. Two aspects of 
the hypothesis Hp have been already studied extensively. If the alternatives 
with respect to which it is desired to test Ho assume specifically that —§ > &, 
(or — < 0), then we have the so-called asymmetric case of ‘‘Student’s Hypothe- 
sis” and it is known, [1], that there exists a uniformly most powerful test of Ho. 
This consists in the rule, originally suggested by “Student,” of rejecting Hy 
whenever 


(1) t= == PWa-1 >t. 


where ~ and S denote the mean and the standard deviation of the observed 
z,’s and t, is taken, for example, from Fisher’s Tables [2] with his P = 2a. 
In other words ¢, is such that 
(2) P{t > ta| Ho} = a, 

where a@ is the chosen level of significance. In accordance with the definition 
of the uniformly most powerful test, whenever any other rule, R, offered to test 
the same hypothesis Hp has the same probability a of Ho being rejected when 
it is true, the power of this alternative test cannot exceed that of “Student’s” 
Test. In other words, if it happens that the true value of & is not equal to £ 
but is greater, then the probability of this circumstance being detected by 
“Student’s” test is at least equal to that corresponding to the rule R. 

If the set of alternative hypotheses is not limited to those specifying the 
value of ¢ either greater or smaller than & , but includes both those categories, 
then it is known, [1], that there is no uniformly most powerful test of the hy- 
pothesis, Hy. However in this case there exists a slightly different test, also 
based on “Student’s’’ criterion t, possessing the remarkable property of being 
unbiased of type B, , [3]. The test, in common use for a long time, consists in 
rejecting Hy when 
186 
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with ¢¢ being taken again from Fisher’s tables, this time corresponding to his 
= a, where a is the chosen level of significance. 

In order to describe the optimum property of this test we must use the con- 
cept of the power function of a test, [3]. Denote by 8(é, 7) the probability of 
the hypothesis H» being rejected when £ and oa are the true mean and the true 
standard error of the observable z,’s. The function 8(£, ¢) is just what is 
‘alled the power function of the test. If we substitute & = &, then we shall 
have B(g , o) = a irrespective of the value of ¢. Now the optimum property 
of “Student’s” test mentioned above consists in that (1) its power function 
has a minimum at — = & and this is true whatever be the value of o, (2) what- 
ever be any other test of the same hypothesis which has the same level of sig- 
nificance a and has property (1), its power function 6’(£, 7) cannot exceed that 
of “Student’s”’ test. 

These two properties, demonstrating the excellence of the criterion suggested 
by “Student,” fully justify the general confidence in the test as described above, 
or in its extended form where it is applied to two or more samples. However, 
it is known that “Student’s” test in both its forms, f > ta, and |t| > ¢., has 
one very undesirable property which causes great difficulties in various problems 
of rational planning of experiments. 

One of the most important questions to have in mind when planning an 
experiment is: What is the probability that the experiment and the subsequent 
statistical test will detect a difference or effect when it actually exists? If we 
perform an experiment and then apply some statistical analysis to test 
“Student’s’” hypothesis that & = & , we do hope that, if the actual value of & 
is different from & , the test will discover this circumstance. But apart from 
mere hope, it is desirable to take precautions so that when the difference, 
§ — & = A, has some appreciable value, the chance of the hypothesis Ho being 
rejected will be reasonably large. This may be done by calculating the value 
of the power function B(, 7) corresponding to the value § = & + A. And 
here we come to the unfortunate property of ‘‘Student’s”’ test. 

Although the form of the power function of “Student’s” test is known and 
tabled [4], [5], [6], [7], there are occasionally considerable difficulties in applying 
these tables, because it appears that the values n and A are not all its arguments, 
for it also depends on ¢. Consequently in order to have an idea of the proba- 
bility that the test will detect the falsehood of the hypothesis Hy» that & = & 
when actually — = & + A we need not only the knowledge of n but also a 
likely value of o. The latter is known accurately only in exceptional cases and 
then in those cases one would apply a test which is different from “Student’s”’ 
test. Usually we have only a vague notion of the magnitude of o and accord- 
ingly the tables of B(~, ¢) may be used to obtain a rough idea as to whether 
the arrangement of the experiment planned is satisfactory or not. Frequently 
we have no idea of what may be the values of o. 

To Dr. P. L. Hsu is due the idea of looking for tests, the power of which is 
independent of the parameters unspecified by the hypothesis tested. In an 
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unpublished paper, he proved among other things that the A test of the general 
linear hypothesis is the most powerful of all those, the power function of which 
depends on the same argument as that of the A test and not on other parameters, 
The above circumstances suggest the following problem: to see whether it is 
possible to devise a test of ‘“‘Student’s” hypothesis such that its power function 
would be independent of «. If such a test could be devised and proved to be 
reasonably powerful then the tables of its power function could be used for the 
purpose of planning experiments. 

The purpose of the present paper is to show that no such test exists and, 
consequently, this negative result implies in still another way that it is im- 
possible to improve on the test originally suggested by “Student.” 


2. Statement of the Problem. The problem of finding a test whose power 
function is independent of o is equivalent to finding a critical region w such 
that the value of the power function 


(4) 


for any fixed — is independent of the value of o, where E denotes the sample 
point (21, %2,---2,). We shall show specifically that if this is the case, then 
the power function is also independent of &; so that the test will reject the hy- 
pothesis tested with the same frequency independently of whether it be correct 
or wrong. 





B(E, o) = P{Eew| &, o} 








3. THEorEM. [f there exists a region w such that, whatever be 


- a | [ em 2 (7180)? da, dx dz 
(5) / 2a 0 eee i=l 1 Qeee n 


: 1 “| / 54 3 (zi— 81)? d d 
(6) / 2a 0 eee é a+ i=] Ty re eee oe 


where & # &, a, B are constants, then 
(7) a= £B. 


A region w is called similar [1] to the whole sample space, W, of size a, with 
respect to a set of elementary’ probability laws p(E | 6) given in terms of a 
parameter 6, if P{E ew|6@} = a, whatever be the value of 6. Essentially, 
then, the region, w, above is a similar region with respect to two different sets 
of elementary laws each being given parametrically in terms of the parameter o. 


the value of o, 


ll 
R 


B, 





n 


Denote by w, the portion of the surface of the hypersphere, Zz (1; — &)? =r, 


t=1 

which is common to w, and let the total surface be denoted by W,. Neyman 
and Pearson have shown [1], that a necessary and sufficient condition that w 
be a similar region, in the above case, is that, whatever be r, the probability 
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that the sample point £ will fall on the subsurface w, , when it is known that 
the sample point lies on the surface W, is a, i.e. 

(8) P{E ew,| (Ee W,)(§ = &)} = @ 


for all r. 
In a similar manner let w, denote the portion of the surface of the hyper- 


sphere dX (2; — &)* = p’ common to w, and let the total surface be denoted 


by W,. Since w is similar to the set of probability laws indicated in (6), we 
have also 


(9) Pi{E cw,|(EeW,)E = &)} = 8B 


for all p. 
Since on the surface W,, the elementary probability law, 


ll 7 a. 3 (zs—Eo)? _ olen — 

(10) (a ;) g 20? iM oe (Se J € 202, 
is constant, we see that an equivalent statement of (8) is that the hyper-area of 
w, 1s a constant proportion, a, of the total hyper-area W,. Similarly, from (9), 
we have that the hyper-area of w, is a constant proportion, B, of the area of the 
hypersurface W, , whatever be the values of r and p. 

Consider the transformation which expresses 21 , 22, --- X, in terms of gen- 
eralized polar coordinates with pole at the point (f , &,--- , &), i.e. 


— & = 7 COS 62 COS 83 --- COS On_2 COS On_1 COS 8, 
= T COS 62 COS 03 -- + COS On-2 COS On_-1 SIN 8, 


= 7 COS 62 COS 63 --- COS On_-2 SIN O,_3 


= 7r COs 62 sin 03 
r sin 62 


Let A be the Jacobian of the transformation: 


(12) |A| =r" "| TI cos’ @n42-:| = r” * 76). 
t=2 


Consider also a transformation which expresses (2; , Z2 , --- Zn) in terms of polar 
coordinates, the point (1, &,---, &) being pole. It may be obtained by 
replacing in (11), f& by &, 7 by p, and 6; by 6;. The Jacobian of this trans- 
formation is given by |A| = p” °T(6,). 

We are now able to express the hyper-area of W, : 


(13) lJ |A|d0sd0s «++ dB, =r" [/ T(0,) dO, d05 +++ d8, = Kr", 


Wr Wr 
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where the integral K > 0 is a constant independent of r. Similarly the hyper- 








area of W, is Kp" ', where K is the same as in (13). According to (8) and ( 
(9) we have, now 
(14) [J | A | d02d03 --+ dO, = a-K-r™", 
anes : 4 
(15) [| | A | d6.d63 --- dd, = B-K-p" 
», ( 
Let us consider the distances between the three points: (%1, 22, --+ , t,), g 
(& , &,--- , &), and (&, &,---,&). The distances of the first point to the 
second point and to the third point we have already denoted by r and p. Let ( 
the distance between last two be L, then, since the sum of two sides is at least 
equal to the third side of a triangle, we have I 
(16) rset lL, psr+l, where L=VJN|& — &|. ( 





Let g(t) = 0 be an arbitrary monotonic nonincreasing function of t, such that 
the product t” y(t) is integrable from 0 to +. Since g(t) is a decreasing 
function it follows from (16) that 


(17) g(r) 2 ¢(o +L) and (pe) 2 o(r + L). 
Consider the integral J: 









(18) [= [| g(r) dx, daze +++ dxn. | 
We shall express it in terms of the variables r, 62, --- , 0, and also in terms of 
p, 62, --- 6, and compare the results. Thus 


[= [fate are, -++ dOn 


[ o(r)dr [ [ (a do.--- dO, 


a-K. [| r”" o(r) dr. 
0 


(19) 


Also we have by (16) 


i. [J |B | or) dp dd, --- dd, 
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(20) > ff Ble + dedi... di, 


2 I gp + L) dp | f |A| dé, .-- db, 
and consequently , 
(21) 128-K | 6 *o(o +L) dp. 
Since K > 0, we have from (19) and (21) 
(22) a/B = I t"* o(t + L) dt f I t”* o(t) dt. 

0 () 

By interchanging p and r in (18), (19), (20), and (21) we have also 
(23) B/a= [ t”* o(t + L) dt / [ t"* o(t) dt. 


Let us set in (22) and (23), y(t) = e”' and g(t + L) = e ’“e”' where p > 0 
is arbitrary. Then 


(24) a/Bz2e”” and Blaze”. 


Since (24) holds for all p > 0, let p approach zero. Then Lim e”” = 1, and 
the above inequalities can hold only if 


(25) a = 8B, Q.E.D. 


It is of interest to note that there do exist regions such that the power func- 
tion is independent of both — and o. For example, let S, be the standard 
deviation of the observed values (x; , 22, --- , Zn) and let S,_; be the standard 
deviation of the values (71, 22, ---,2%n-1), then the region w given by all 
points (x1, 22, --- Zn) which satisfy the inequality (S,.1/S,) 2 C is such a 
region, i.e. 

(26) P{(S,4/S,) & C | g, a} 


is constant, whatever be the values of — and o. Such regions are, however, 
unsuitable for testing ‘“‘Student’s” hypothesis § = & , because they will reject 
this hypothesis when it is wrong and when it is correct with equal frequency, 


The author is indebted to Professor J. Neyman for assistance in preparing 
the present paper. 
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A METHOD FOR RECURRENT COMPUTATION OF ALL THE 
PRINCIPAL MINORS OF A DETERMINANT, AND ITS 
APPLICATION IN CONFLUENCE ANALYSIS 


By Ovav REIERS¢ZL 


1. Recurrent computation of all the principal minors of a determinant. 
The formulae which I develop in this paper have been worked out for use in 
statistical confluence analysis. By means of recurrent computation they shorten 
considerably the amount of work required to compute all principal minors of a 
square matrix. Originally I elaborated this method as a simplification of one 
given by Frisch (not published). 

Subsequently I found that the method could more easily be deduced from the 
pivotal method. This method has been described, for example, by Whittaker 
and Robinson [5] and by Aitken [1]. 

Let us consider a square n-rowed matrix 


| Qu arp 


(1) 


Ont +++ Gnn 


Let the adjoint of this matrix be || p;; || and let us denote its determinant 
value by Dye...n. 


Then we have the following identity 


Pn-1,n-1 Pa-1,n 
(2) = a 
Pa,n-1 Pan 


As Aitken points out, the pivotal method is based upon this identity. 


Next consider the following matrix which is formed from the matrix (1) by 
striking out the mth row and the (n — 1)th column: 


(3) 


Gn—-1,n-2 On-i,n 
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Let us denote its adjoint by || q;; ||, its determinant value by Aiz...n = —Dp,. 
The determinant 













Gn-91 *** Gn2n-2 Gn2n—1| — —Pn,n—1; 





Qn,n-2 An,n-1 






we shall denote by Bie..... 
The identity (2) can now be written 


Dye... .n—2 n Dip... .n—2 a Ax... .n Bis..-n 
2’ Dyg...q = 
(2') 12 — 


If we apply the identity (2) to the matrix (3) we get 











Qn—2,n—2 Qn—2,n-1 


== ei Dy. oon—By 





Qn—1,n—2 Qn—1,n—1 




















which may also be written 


Aie...n—3 n—1 a sini Ais...n—3 n—2. n Bye...n—1 
4 hee... SE re @ 
(4) 12 — 


To simplify the notation we will not write the affixes present, but write the 
affixes not present in inverted parentheses. Then our formulae (2’) and (4) 
can be written 





a Dy n—-wD) nx — AB 


D 
Dyn-1,0( 
A = Aime Dyn—ine — Ayn—1Bync 


D)n—2,n-1,n( 


In an analogous way we get 


B = Pon Dyn—r.nc — By Arne 


D)n-2,n-1,( 





We may apply these formulae to an arbitrary principal minor D,,,,....,. 
Let us now denote D,,,.,..., by D and denote the absence of one or more of the 
numbers 1, v2, --- v% by placing them into inverted parentheses. We then 
have the formulae: 


(5a) A= Ayoy—2( Dyoy—s.0n( — Aron Brent 
Dyoy—2.04- 1.046 
(5b) B = Pien-2Dror-1.00¢ — Boor Aron ; 
Dyox-2,04-1.04( 
— Din Dy — AB 


(5c) D 
Dyox- 1,026 


TI 


an 
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By means of these formulae we can recurrently compute all principal minors. 
We begin with D; = ay ,i = 1, 2--- n, Ai; = ai;, Bi; = ay, where i < j. 
Then we compute the D’s with two affixes, 


Di; = DD; — AijBi;, 
and then the quantities A, B, D with three affixes, 
An = AnD; — An Bi 
Bix = Bu D; — BuAg 


Du Di — Ain Bijx 
D; : 


Then we compute the quantities A, B, D with four affixes, and so on. 

If we carry through the computations without dropping any figures we have 
as a control that all divisions will be exact without remainder. If we are 
dropping figures we can control the result by computing the determinant 
Dy....in another way. If we wish to control the computation before it is com- 
pleted, we may use our recurrence formulae on the matrix which we get from 
the original matrix when the rows and the columns are subjected to the same 
permutation. For example we can reverse the order of the rows and columns. 
Then we can control the ( — 1) rowed minors before computing the k-rowed 
minors. 

If all the D’s are different from zero, we may reduce the necessary number of 
multiplications and divisions in the following way. We introduce the following 
notations: 


Dix = (</<€6 


D 
a= 
Dye 
A B 
1 ance b 
Dyo4-1.04( 
b 
d)oy¢ 


Substituting in (5), we get the following system of recurrence formulae: 


a 


Dyoy- 1.04 


c= — 


(6a) @ = Ayo H Bvy—1( Choy 


(6b) b= by 4-9 + QD) oy C)vp—1( 


b 
(6c) - 
yo, 


(6d) d= Dyor-( + ac 
(6e) D = Dd. 
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An affix v;, on a letter indicates the deletion of the last row and column in the 
determinants making up the definition of that letter, even though those deter- 
minants are of lower order than »,. Similarly, an affix v,_; indicates the dele- 
tion of the next to the last row and column. 

The a’s with two affixes in these formulae are identical with the elements a,; 
of the matrix (1) wherez <j. Further, b;; = aj:,7 <j,d; = ai. Applying 
the recurrence formulae (6) we start with these values. 

If the matrix (1) is symmetric, i.e. if a;; = a;;, then we get 


Bases +o°DE = Aeies---0 
and 


Oe. vg-s05 = Dy v9 -+-vE . 


In this case we can therefore replace B by A in the formulae (5) and replace b 
by a in the formulae (6). 

Numerical example. Let us compute all the scatterances in the constructed 
example given by Frisch, [3, p. 121]. The correlation matrix in this example is: 


1.000000 — 0.121551 0.656809 0.752502 — 0.224549 
— 0.121551 1.000000 0.657698 — 0.732862 0.212165 
0.656809 0.657698 1.000000 0.014385 — 0.040183 
0.752502 — 0.732862 0.014385 1.000000 — 0.280223 
— 0.224549 0.212165 — 0.040183 — 0.280223 1.000000 


Using our recurrence formulae (6) we get the following table: 


a c d D 
12 —0.121 551 0.121 551 0.985 225 0.985 225 
13 0.656 809 —0.656 809 0.568 602 0.568 602 
23 0.657 698 —0.657 698 0.567 433 0.567 433 
14 0.752 502 —0.752 502 0.433 741 0.433 741 
24 —0.732 862 0.732 862 0.462 913 0.462 913 
34 0.014 385 —0.014 385 0.999 793 0.999 793 
15 —0.224 549 0.224 549 0.949 578 0.949 578 
25 0.212 165 —0.212 165 0.954 986 0.954 986 
35 —0.040 183 0.040 183 0.998 385 0.998 385 
45 —0.280 223 _ 0.280 223 0.921 475 0.921 475 


123 0.737 534 —0.748 594 0.016 489 0.016 245 
124 — 0.641 395 0.651 014 0.016 184 0.015 945 
134 —0.479 865 0.843 938 0.028 765 0.016 356 
234 0.496 387 — 0.874 794 0.028 677 0.016 272 
125 0.184 871 — 0.187 643 0.914 888 0.901 371 
135 0.107 303 —0.188 714 0.929 328 0.528 418 
235 —0.179 723 0.316 730 0.898 062 0.509 590 
145 —0.111 249 0.256 487 0.921 044 0.399 495 
245 —0.124 735 0.269 457 0.921 272 0.426 516 
345 —0.279 645 0.279 703 0.920 167 0.919 977 





1234 
1235 
1245 
1345 
2345 
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a 
0.000 279 
— 0.031 090 
0.009 105 
— 0.020 692 
0.032 486 


c 
—0.016 6 

1.885 5 
— 0.562 6 

0.719 35 
—1.132 8 


d 
0.016 179 
0.856 268 
0.909 766 
0.914 443 
0.861 262 


D 


0.000 262 83 


0.013 910 
0.014 506 
0.014 957 
0.014 014 


12345 0.009 621 —0.594 7 0.850 546 0.000 223 55 
2. Computation of the coefficients of the characteristic polynomial of a 


matrix. The characteristic polynomial of the matrix (1) is 


= P, — Paid + Paad — «++ +(—1)"r". 

As is well known, the coefficient P; can be calculated as the sum of all the 
k-rowed principal minors of the matrix (1). Our methed of computing all the 
principal minors of a matrix therefore gives us as a by-product a method of 
computing the coefficients of the characteristic polynomial. Another method 
for the determination of these coefficients has been given by Paul Horst [4]. 

We may obtain a comparison between the work of computation entailed by 
the two methods by calculating the number of multiplications and divisions 
necessary when using one or the other method. If our recurrence formulae (6) 
are used, two multiplications and one division are necessary for computing a 
2-rowed minor, and 4 multiplications and one division for every minor with 3 
or more rows. Consequently the total number of multiplications and divisions 


will be: 
n “[n 
S, 3(5) +5 dX (7) 


5.2" — (n’ + 4n + 5). 


On using Horst’s method, the number of necessary multiplications and divi- 
sions will be found to be 


H, = (4n — 1)n® + 3n® + in — 1)(n + 2) 


H, = 3(n — 1)(n® + n + 2) 


H, = }(n -— 1)(n° +nrr+n+ 2) 
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20283 





When n = 2, 3, --- 12, S, and H, acquire the following values: 
n Sa H n 
2 3 ° 6 
3 14 41 
4 43 105 
5 110 314 
6 255 560 
7 558 1203 
8 1179 1827 
9 2438 3284 
10 4975 4554 

11 10070 7325 


9581 






We see that our method of computing the coefficients of the characteristic 
polynomial involves less calculation when n < 10, while Horst’s method is su- 
perior when n 2 10. 

If our purpose is to find the characteristic roots of the matrix we can do this 
with less amount of computation without first finding the coefficients of the char- 
acteristic polynomial. See Aitken, [2]. 

















3. Applications in confluence analysis. The confluence analysis of Frisch is 
set forth in his book: ‘Statistical Confluence Analysis by Means of Complete 
Regression Systems,” [3]. 

The main method of this book is the ‘bunch analysis,’ which includes the 
computation of the adjoints of the correlation matrices of all sets of variates 
contained in the total set. In section 1, Frisch has described a preliminary 
analysis by means of scatterances. The scatterances are the principal minors 
of the correlation matrix of the total set of variates. If we carry through such 
an analysis, the recurrence formulae of section 1 of this paper will give a rapid 
method for the calculation of all the scatterances. 

Another application of the computation of all the scatterances arises in the 
determination of the correct time lags between variates in a structural equation. 
This problem will be treated in a paper on confluence analysis which will appear 
in the near future. 
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NOTES 


This section 1s devoted to brief research and expository articles, notes on methodology 
and other short items. 
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A CRITERION FOR TESTING THE HYPOTHESIS THAT TWO 
SAMPLES ARE FROM THE SAME POPULATION 


By W. J. Drxon 


1. Introduction. The purpose of this paper is to consider a criterion for 
testing the hypothesis that two samples have been drawn from populations with 
the same distribution function, assuming only that the cumulative distribution 
function common to the two populations is continuous. Let the two samples, 
O, and O,,, be of size n and m respectively. We may assume n < m without 
loss of generality. Suppose the elements 1 , --- , vu, of O, are arranged in order 
from the smallest to the largest, that is, uw. < ue < --- <unz. These may be 
represented as points along a line. The elements of 0, represented as points 
on the same line are then divided into (n + 1) groups by the first sample, O,. 
Let m, be the number of points having a value less than wu , m; the number 
lying between u; and ui41, (¢ = 1, 2, --- , n) and m,4; the number greater than 
Un, (May = M — Mm, — Mm — --- — mz). The criterion here proposed is’ 


o) c= 2 (45-2): 


1A similar criterion 


Pe 
n 


$a) 


n 


for two samples of the same size was investigated (unpublished) by A. M. Mood. He 
found the mean and variance to be 


2n + 2. _ 8(n — 1)(2n + 1) 
E(d@) = — +" * 


It can be seen that this is the sum of the squares of the differences between the ordinates 
of the two cumulative sample distributions calculated at the jumps of the first sample 
distribution. 
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2. The mean and variance of C’. The only case of continuous cumulative 


distribution functions F(z) of any interest in statistics is that in which dF(z) = 
f(x) dx, where f(x) is a probability density function. Let us write: 


= [. s@ a, P= [a ae, o29) Day = [1 dz, 


where of course Paii = 1 — pr — P2 — +++ — Dn; 









Now, the joint distribution law of the p; is x % S< 
(2) P (pi es 


and the conditional distribution of the m, given the p; is 








» Dn) = n! dp, --- dpn 












m! mm 
(3) P(m, eee, Mn4i| D1, er >) = ay... at” * wee Davi: 





Therefore the joint probability law of the m; and p; is 






Im! 
(4) P(m, p) = met ial D2? *** Pnti Gp +++ upn. 


n+l ; 
Let 9(0) = g(1, «++, Angi) = B| exw 2 Ae we i -™)|, then 


















(5) BC) = 3-28 . 

6) micty| = & ne) + Dae aad. 

and 

(7) (0) = Xe f exp [= a(t - -™)] P(m, p), 


where 2» denotes the usual multinomial summation over all integral values of 
m; > 0 for which 2m; = m and the integration is over the generalized tetra- 


hedron defined by pi > 0 and pi + po +--+ + Pau: < 1. If we perform 
the summation first, we obtain 





s Os 
On+1 


(8) e(6) = nle™ ” [ oe" e™ +... + payie ™ )"dp--+ dpn. 


Differentiating twice with respect to 6; and setting the 9’s equal to zero, we get 


ey _ 1 , 1 2 
56 oS (ca) +(1- aH) pi + = pt |p: --+ dps. 


If we now integrate and sum from one to n + 1, we find 






2 _ n(n+m-+ 1) 
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Performing the operations indicated in (6), we obtain E[(C’)’] from which we 
subtract [E(C’)) and have as the variance of C’, 


otg = Anim — (m+ n+ Iim+nt2). 
- m(n + 2)?(n + 3)(n + 4) p 


3. Significance values of C*. If we let C% be defined as the smallest value 
of C® for which P(C? > C%) < a@ then we can compute the value of C% fairly 


TABLE I 
Values of C2... a = 0.01, 0.05, 0.10 


4 5 6 7 


800 


833 
.800 .833 


—— ——— 2 
.800 .833 .857 
800 .556 .413 


—  .833 .857 .875 
.800 .588 .612 .467 
555 .425 .449 .426 


.800 .833 .857 .656 .670 
800 .594 .482 .469 .389 
425 .413 .357 .375 § .358 


—  .800 .833 .660 .677 .543 ~~ .554 
.750 .602 .448 .413 .431 .395 = .381 
552 .454 .389 .363 .356 .321 .307 


—  .800 .833 .677 .555 .549 .480 .449 
10 667 .750 .480 .493 .437 .415 .349 .340 = .349 
487 .430 .380 .373 .357 = .315 .309 .280 8.269 


readily for small values of mand n. The values of C’ for m, n < 10 are given 
in Table I for a = 0.01, 0.05 and 0.10. Since the distribution of C’ is not 
continuous the probabilities P(C? > C%) will, in general, be less than a. 


a 
- 
‘ 
oJ 
rT 
4 
a 
7 
= 
s 
ef 
e 
= 
i 
Ci 
~ 
« 
- 
- 
Ci 


|| mt elieiaat 








202 W. J. DIXON 





It will be seen that if m and n increase indefinitely in the ratio n/m = y, 
then nC’ converges stochastically to y + 1 whereas nC’ ranges from 0 to 
n’/(n + 1) which indicates a tail to the right. This suggests that for larger 
values of m and n, it is reasonable to try to fit the distribution of nC’ by the 
method of moments using a distribution of the form 


(ka?) —hkz? 2 
11 d(k. 
(11) Frdp)° (ka’) 
which has 



















E(z’) = one = = 
Setting z” = nC’, we see that we can consider nkC’ distributed as x’ with » 
degrees of freedom. Of course, v is not necessarily an integer, but x’ tables 
may be used for approximate values of the probability that nkC* will exceed 
certain values,’ or the values of nkC’ that will be exceeded a certain per cent 
of the time.’ More exact values of these probabilities that nkC’ will exceed 
a certain value may be found from a table of the incomplete Gamma function.‘ 

To calculate k and » directly, the following formulas obtained by equating 
the mean and variance of (11) to the mean and variance of nC’ may be used: 


(12) k =am(n+2)/n,  v=anin+m-+1)/(n+)), 


where 











m(n + 3)(n + 4) 
2(m — 1)(m+ n+ 2)(n + 1)" 


If the fitted curve (11) is used to obtain significance values of nC’, there is a 
tendency toward rejecting slightly over 100a%, especially for small values of 
mand n. The error is probably due to fitting a curve having an infinite range. 
The discrepancy decreases as m and n increase. 

The goodness of fit at the 0.01, 0.05 and 0.10 significance levels was tested 
for two cases. 

Case 1. n = 9, m = 10; nk = 288°,» = 5. 
The exact distribution in the region under consideration is the foliowing: 


a= 













C2 ... .26 || .28 .80 .32 |} .84 .86 .40 .42 ||.44 .48 











P(C? > C3) |... .121 || .090 .082 .072 || .037 .033 .025 .025 || .015 .007 ... 






The values of C?, from the fitted curve are Co, = 0.422, Cos = 0.323 and 
Co = 0.277. The double rule indicates the divisions (from the fitted curve) 
for a = 0.01, 0.05 and 0.10. 








* Karl Pearson, Tables for Statisticians and Biometricians, part 1, Table XII. 
*R. A. Fisher, Statistical Methods for Research Workers, Table III. 
‘ Tables of the Incomplete Gamma Function, Biometrika Office, London. 
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Case 2. n = 12, m = 12; nk = 65.068, v = 8.938. 
The important part of the exact distribution for our purposes is: 


hy ... +215 || .229 .243 .256 || .270 ... .326 || .340 .354 .381 


P(c? > C2) |... .120 || .109 078 .057 || .046 ... .017 || .014 .011 .009 


The values of C%, from the fitted curve are C%, = 0.3315, Cos = 0.2587 and 
Ci = 0.2244. 


4, Examples. 1. Two samples of ten members each are drawn and it is 
desired to test, using a rejection region of size a, the hypothesis that these two 
samples could have originated from the same population about which nothing 
is assumed except that it is continuous. The first sample was found to divide 
the second sample into the following groups: 0, 0, 0, 3, 0, 4, 0, 0, 2, 1, 0. 


C? = (Ay — who)” + Gr — Bo)? + Ar — fe) + Gr — Yo)” + 7G)’ = .209 
which we see from Table I is not a significant value even for a = 0.10 since 
Cio = 0.269. 

2. A sample of 15 divides a second of 25 into the following 16 groups: 0, 1, 
0, 0, 5, 4, 1, 3, 9, 0, 0, 1, 0, 1, 0, 0. 

Ct = (hy — ae) + (te — as)? + (Pe — ate)” + (as — as)” + ACs — te)? + 825)" 
nC’ = 2302 k=7.511 »v = 10.19 
nkC? = 17.295 


which gives a significant value for a = 0.10 but not for a = 0.05, since nkC’9 = 
16.233, nkC’o5 = 18.568. Actually P(nkC? > 17.29) = .077. 


5. Remarks. If we set W equal to the number of m; which are zero and 
V = n+ 1 — W then V is the number of non-zero m; ; further, 2V ~ U where 
U is the total number of runs, the criterion proposed in the paper of Wald 
and Wolfowitz in the present issue of the Annals of Mathematical Statistics. 
Now, 


n+l 
(13) W= $Iiim 


Zys*Tn+1—90 tml 


so that, setting 


(14) © = Zs exp b 6; (,. + ;> “)) Se ar P(m, p), 


analogous to (7), we have 


E(WC’) = si ilim > I, 


2 
Zi *'stn+1—70 k=l OO, 


. -evern Tar Fin Berri tera ee UF Ser 
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from which we can find 


__2nQ =m) 
m(n + 2)(m + n) 


Pyc? Fy Fc? = 


fap = et 3nt Him+n—- 1) _ 
vor eve” (n+ 1)\(m + n + 1)(m + n + 2) 
If n/m = y (a fixed constant) and n is large 
ca, n 
~n+m 
p will be near 1 when n is much larger than m. This corresponds, in com- 
puting C’, to dividing the smaller sample into subgroups by the larger. In 
this case U and C” give essentially the same information. When m and n are 
more nearly equal the two criteria are quite different. For n > m, C” has 
fewer possible values than for n < m, and is therefore a more sensitive test 
when n < m. 
While it is doubtful that this test is biased for large samples, this question 
will not be considered in the present note. 


PRINCETON UNIVERSITY, 
PRINCETON, N. J. 


SIGNIFICANCE TEST FOR SPHERICITY OF A NORMAL n-VARIATE 
DISTRIBUTION 


By Joun W. Maucuiy 


1. Introduction. This note is concerned with testing the hypothesis that a 
sample from a normal n-variate population is in fact from a population for 
which the variances are all equal and the correlations are all zero. A popula- 
tion having this symmetry will be called ‘“‘spherical.”” Under a linear orthogonal 
transformation of variates, a spherical population remains spherical, and conse- 
quently the features of a sample which furnish information relevant to this 
hypothesis must be invariant under such transformations. 

A situation for which this test is indicated arises when the sample consists 
of N n-dimensional vectors, for which the variates are the n components along 
coordinate axes known to be mutually perpendicular, but having an orientation 
which is, a priori at least, quite arbitrary. A specific application for two 
dimensions, treated elsewhere [1], may be mentioned. Each of N days fur- 
nishes a sine and a cosine Fourier coefficient for a given periodicity, and these, 
when plotted as ordinate and abcissa, yield a somewhat elliptical cloud of N 
points. The sine and cosine functions are orthogonal, and their variances have 
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equal expectancies for a random series. The arbitrary nature of the orientation 
of axes appears here as the arbitrary choice of phase, or origin of time. Of the 
five ellipses studied, three could easily have come from circular populations 
(random), and two showed highly significant ellipticity. 


2. Likelihood ratio criterion for sphericity. The method of Neyman and 
Pearson [2] will be used to derive a test criterion which seems entirely suitable. 
Let © be the class of all normal n-variate populations, and let w be the subclass 
of all normal n-variate populations satisfying the hypothesis of ‘‘sphericity.” 
The likelihood ratio criterion is obtained by taking the ratio of the maximum 
of the likelihood for variation of all population parameters specifying w, to the 
maximum of the likelihood for variation of all population parameters speci- 
fying 2. That is, 


_ P(@ max) 
~ P(Q@ max)’ 


For the set 2, the probability law for a single observation of the n variates 
may be written: 


(2) P=K | Q;; \ ee; ey j(zi—a,) (x7j—@;) (i,j _ 1, n), 


(1) As 


where c;; is an element of the matrix || a;;||"', the a;; being variances and 
covariances, a; is the mean value of the variate z; in the population, and K is a 
constant the value of which does not concern us here. Then a sample’of NV 
from 2 has the probability, 

(3) 


N 
P _ K* | ai; [EN AE ey (Fia05)(2ja—9j) 


Letting 


(4) >» Tie = N%; and > (tia — £:)(2ja — £;) =N 853, 


a=1 


differentiating the logarithm of P with respect to the parameters a; and a;;, 
and setting these derivatives equal to zero, the maximum likelihood estimates, 


* 


(5) G:=%:; i; = 8;, 


are obtained. Substituting these values in equation (3) we find that the maxi- 
mum value of the likelihood is 


(6) P(Q max) = K™ | 8;; |e”. 


The derivation of P@ max) proceeds upon similar lines, but is simpler, for 
the probability law for the set w is obtained from (3) by setting 


(7) Ci; = c6;; , 


| ee Dl ae 
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where c is any positive constant, and 6;; = Oif7 # jand1lifi =j. The result 
is found to be 


(8) P@ max) = K"(a)j "6 
where s is defined by 
nso = z Siz. 

i=1 


The likelihood ratio criterion is therefore 


| si; 4] 
10 .= 
(10) nN | (sy) 


It will be convenient to designate the Nth root of this statistic as L., , where 
the second subscript indicates the number of variates: 





fee 
(11) | 


: 
ian 


So 

















3. The moments of the distribution of L.,, when the population is spherical. 
The distribution of L,, cannot be easily obtained in explicit form for a general n, 
but the moments of L,, when the hypothesis tested is true are easily found. 
Note first that L,, may be resolved into two factors which are, when the 
population is spherical, statistically independent: 
3 
(12) os (s1 $2858 +++ Sn) Avglt. 


n 


So 





The first factor is just the one appropriate for testing the equality of the n 
variances when the orientation of the coordinate axes is fixed in advance, while 
the second factor is the square root of the determinant of correlation coefficients. 
The moments of the distributions of these two statistics are known [3], and 
since the two are independent (for zero correlation in the population), we may 
write: 


(13) 





Mi(Lsn) = M,(A)M,(B), 








where A and B are used to indicate the two factors, and M),, indicates the Ath 
moment. The moments are given by 


_ 7 Ten —it+ h)} oun = T3(n(N — 1)) 
— eee | raw —i) I” FMW 14h): 


4. Significance test for n = 2. For n = 1, M,(L.) = 1 for any Ah, as it 
should, since L,; is then identically 1, and the concept of sphericity is meaning- 
less. For n = 2, the expression (14) reduces to, 

r(N —2+A)T(N-1)_ N-2 


(15) Mi(L.2) = r(N —— 1 re h)T(N ae 2) Pa N— 3+ h 





ult 


Te 
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and the distribution is thus found to be 

(16) D(La) = (N — 2)L¢5* dL. 

Thus for n = 2, the significance of the value of L.2 obtained from a given sample 
of N points in a plane is simply 

(17) P(La < Li) = Liz”. 


These results for n = 2 were obtained by another method in [1]. 


§. Significance test for n = 3. For n = 3 and higher values of n, no simple 
expression for the distribution seems obtainable. In this case it appears reason- 
able to fit a Pearson curve of the type, 


(18) y = Kx?""(1 — 2)", 
by adjusting p and q so as to obtain agreement with the first two moments of 
the actual distribution. The calculations were carried out for L?3 rather than 


L.s itself, to simplify the moment expressions. The first moment of L?; is the 
second moment of L,3 , and is given as a function of N by the equation, 


(3N — 6)(3N — 9) 
(3N — 2)(3N — 1)’ 


Recurrence relations, similar to those noted by Lengyel [4] in carrying out a 
i . 2 
similar task, hold for the moments of L33 ; hence, 


(20) uN) = wi(N)ui(N + 2). 
Explicit solution of the equations for p and q in terms of N is possible: 
— ON + 5)(N — 2)(N — 3) 
2(9N? — 8N — 15) : 
2(9N — 13)(9N + 5) 
9(9N? — 8N — 15) ~ 


(19) wi(V) = 


(21) 


(22) 


For values of N > 30, acceptable approximations to p and q are obtained by 
carrying out the division indicated in (21) and (22): 


(23) p = 3(N — 4) + 2/9 + 70/81(N + 1) ---, 


ss 
9(3N — 2)? °°" 


The values of p and q are given in Table I so that those desiring other than 
the standard significance levels may readily enter the Pearson tables. 

For N a multiple of 4 from 8 to 48, and a multiple of 10 from 50 to 100, the 
significance levels were taken from the Incomplete Beta-Function Tables, using 
adequate interpolation. The final Table I was then prepared by filling in the 
skeleton table by interpolation with respect to N. 

From the results of Wilks [5] it follows that —2N log. Lin is, for large N, 


(24) q=2+ 





airy FARA? 8 My A Mt 
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N 5% 

8 0.172 
10 .278 
12 . 366 
14 .436 
16 494 
18 .541 
20 . 580 
22 .614 
24 642 
26 . 667 
28 .689 
30 . 708 
32 724 
34 739 
36 753 
38 765 
40 776 
42 . 786 
44 795 
46 .804 
48 811 
50 .819 
55 . 834 
60 .848 
65 .859 
70 . 869 
75 .877 
80 885 
85 .891 
90 .897 
95 . 902 

100 . 907 





















1% 


.083 
. 165 
. 243 
.312 
372 
.423 
.466 
. 504 
. 538 
. 567 
. 593 
.616 
.637 
.655 
.672 
.687 
.701 
.714 
. 726 
. 736 
.746 
. 756 
.776 
.793 
.808 
.821 
. 832 
.842 
.851 
.859 
.866 
.872 
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TABLE I 


0.1% 
0.030 
.080 
.139 
197 
. 252 
.301 
.346 
.386 
.422 
454 
483 
.510 
534 
555 
.575 
594 
.610 
.626 
.640 
.653 
.665 
677 
. 703 
725 
744 
. 760 
775 
. 788 
.799 
. 809 
.819 
.827 


p 

2.3239 
3.3044 
4.2911 
5.2816 
6.2744 
7.2688 
8.2642 
9.2605 
10.2574 
11.2548 
12.2526 
13.2506 
14.2488 
15.2473 
16.2458 
17.2447 
18.2435 
19.2425 
20.2416 
21.2408 
22.2400 
23 . 2394 

* 


28 . 2365 


ok 

33.2345 
* 

38 . 2328 


43.2317 
* 


48 .2308 





570», 1%, and 0.1% levels of significance for the 3-dimensional sphericity criterion, 
Li; = \”/*, and the values of p and q for the Pearson Type I curves used in 
calculating these levels 


q 
2.0312 
2.0194 
2.0131 
2.0095 
2.0072 
2.0057 
2.0046 
2.0038 
2.0032 
2.0027 
2.0023 
2.0020 
2.0018 
2.0016 
2.0014 
2.0012 
2.0011 
2.0010 
2.0009 
2.0008 
2.0008 
2.0007 

* 


2.0005 
* 
2.0004 
ra 
2.0003 
* 
2.0002 
. 


2.0002 


* No values for p and q were calculated for these values of N; the levels were obtained 
by interpolation (see text). 


distributed approximately like x’ with n(n — 1)/2 degrees of freedom. How- 
ever, equation (24) above suggests that for large N one may get a very good 
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approximation (for n = 3) by setting qg = 2; the significance test for n = 
then becomes, 


(25) P(Lis < Lis) = 4113 “((N — 2) — (N — 4)L,5). 


Probably similar approximations can be found for other values of n. It is a 
pleasure to acknowledge the helpful comments and advice which I received 
from Mr. A. M. Mood of Princeton. Recognition is also due Mr. Wallace 
Brey, a student assistant under the National Youth Administration, who aided 
in the computations. 
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A SIMPLE SAMPLING EXPERIMENT ON CONFIDENCE INTERVALS 


By S. KuLLBACK AND A. FRANKEL 


1. Introduction. In order to illustrate some of the notions of the theory of 
confidence or fiducial limits in connection with a course in Statistical Inference 
at the George Washington University, we had the class carry out certain simple 
experiments, following a suggestion in one of Neyman’s papers on Statistical 
Estimation [1]. In the belief that the experimental data may be of interest 
to others, we present the results herein. 


2. The problem. We consider the problem of estimating the range @ of a 
rectangular population defined by p(x, 0) dx = dx/@,0 S x S 6 and in par- 
ticular, for simplicity, we limit ourselves to samples of two and four. We 
consider three possible approaches to the problem, viz., by using (a) the sample 
range (b) the sample average or total (c) the larger (largest) sample value. 
Let us consider each in turn. 

(a) Sample range. Wilks [2] has shown that for samples of n and confidence 
coefficient 1 — a, the confidence or fiducial limits for the population range @ 
are given by r and r/f,., where r is the sample range and y, is determined by 


(1) va [n oe (n a 1)Pal om a 


For n = 2,a = 0.19 and n = 4,a = 0.1792, (1) yields Ja = O.L andy. = 0.4 
respectively. Accordingly, for samples of two with confidence coefficient 
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1 — a = 0.81, and for samples of four with confidence coefficient 1 — a = 
0.8208, the confidence interval is respectively given by 


(2) (r,10r) and (r, 2.5r). 


The length, A,, of the confidence interval is respectively 9r and 1.5r. Using 
the distribution of r, n(n — 1)(@ — r)r”*/0", we have for samples of two: 
E(\,) = 36, o,, = 2.12136, and for samples of four: E(A,) = 0.98, o., = 0.38. 


(b) Sample total. Following Neyman [1, p. 357] let us denote by A(@) the 
region defined by 


(3) @-ASuU+wRLO+A 


where @ is the population range, 2; and x2 the sample values of the sample EF, 
and A is selected so as to have P{E, « A(6) | 0} = 1 — a. It is readily found 
that P{E. «A(@) | 6} = [6 — (@ — A)’]/e = 1 — a@ from which we find that 
A = 0(1 — a”). Accordingly (3) becomes 6a” < 2, + 2 < 0(2 — a”), 
yielding the confidence limits (21 + 22)/(2 — a”), (a1 + a2)/a"? For the 
confidence coefficient 1 — a = 0.81 the confidence interval is given by 


(4) [0.6394 (x + 22), 2.2941 (2, + 22)]. 
The length of the confidence interval is given by Ar = 1.6547(21 + 22) so that 


E(Ar) = 1.65470, o,, = 0.67550. 
Let us denote by A’(@) the region defined by 


(5) 20 — AS 2% + te + M3 + % S 20 + A, 


where @ is the population range, 21 , 42, X3 , 24 the sample values of the sample 
E, and A is selected so as to have P{ E, € A’(@) | 0} = 1 — a. Using the known 
distribution of the sample average [3] and 1 — a = 0.8208, it is readily found 


that 
P{By< A") \0} = 184 _ s(3) + 12(4) = 0.8208 
) 3 \40 46 40 
from which we find that A = 0.7880. Accordingly, (5) becomes 1.2120 < 
Xi + ro + 23 + xy < 2.78886, yielding the confidence interval 


(6) (0.3587 (21 + x2 + x3 + 24), 0.8251 (2; + xo + 23 + 24)] 
The length of the confidence interval is given by Ar = 0.4664(21 + 22 + 23 + 24) 
so that E(Ar) = 0.93286 and o,, = 0.26798. 

(c) Larger (largest) sample value. Again following Neyman (1, p. 359] let us 
denote by A;(@) the region defined by 
(7) ga<L< é6 


where @ is the population range, L the larger of the two sample values 2; and 22 
and q, anumber between zero and unity, to be determined by P{ E2 € A;() | 6} = 
1— a. It is readily found that P{E2 «A,(0) | 0} = (@ — q°6°)/@ = 1 — a, 
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from which we find that g = a”. Accordingly, (7) becomes 6a\’® < L < 6 
yielding the confidence limits L, L/a'’. For the confidence coefficient 1 — a = 
0.81 the confidence interval is given by 


(8) (L, 2.2941L). 


TABLE I 





No. of cases of Frequency 
coverage per | ee a ee ae 

set of 100 ; 

samples Sum Larger (Largest) 
‘ | Samples Samples Samples Samples Samples Samples 
of two of four of two of four of two of four 


69 | 1 
70 
1 


— 


WOH DPD W 


NnwNrR oN WHE WW HD 


4 
2 
3 
9 
3 
2 
2 
3 
3 
3 
2 
1 


KS OK WORK 


15 | 39 15 





Average... .| ‘ | 1 | 2 | 84.2 80.2 82.1 


The length of the confidence interval is given by A, = 1.2941L so that using 
the distribution of L, nL”~' dL, we have E(A1) = 0.86276 and o,, = 0.30508. 
Incidentally, since L & x; + x2 we have 1.2941L < 1.6547(x, + 22) so that 
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in every case, for samples of two, the confidence interval of procedure (c) is 
shorter than the confidence interval of procedure (b). 

For samples of four, we consider the region (7) where L is the largest of the 
sample values 7,1, %2, 3 and zx, of the sample E,. It is readily found that 
P{E,¢ A,(0) | 0} = (0 — q°8°)/@" = 1 — a, from which we find that q = q, 
For a = 0.1792, gq = 0.6506 so that (7) becomes 0.65060 < L < 6 yielding 
the confidence interval 


(9) (L, 1.5370L). 


The length of the confidence interval is given by A, = 0.5370L so that E(A,) = 
0.42960 and o,, = 0.08778. 






















TABLE II 
Range Sum | Larger a 
Sample es 


size ‘Theo- | Ob- Theo- : Ob- Theo- Ob- ; 

retical | served ; retical | served | retical served 

Confidence Coefficient 2 .8100 .8110 .8100 .802  .8100) .8020 

4 8208) .8210 .8208 .842 .8208 .8210 

Average length of confi- 2 3.0000 2.9660 1.65471.6441. .8627| .8556 

dence interval per set 4 .9000 .8976 .9328) .9296 .4296) .4272 
of 100 samples 

Standard deviation of av- 2 .2121) .2133) .0676| .0581) .0305! .0293 

erage length of confi- 4 .0300' .0335) .0268 .0140' .0088' .0093 


dence interval 














3. The Experimental Data. We considered the rectangular population with 
6 = 1 and obtained the sample values by using pairs of digits obtained from 
Tippett’s random sample tables [4]. Using these observed values the confi- 
dence intervals given by (2), (4), (6), (8) and (9) were computed and the number 
of cases in which the value 6 = 1 was covered, noted. In all, 3900 samples 
of two were observed, subdivided into 39 sets of 100 each. The samples of 
four were obtained by combining pairs of samples of two and there were studied 
1500 samples of four, subdivided into 15 sets of 100 each. Table I gives the 
observed distribution of the number of cases of coverage per set of 100 samples 
of two and of four. The length of the confidence interval obtained by each of 
the three procedures was obtained and the observed mean and standard devia- 
tion of the distribution of the average length of the confidence interval per set 
of 100 samples computed. (Since they are averages of 100 values, these ob- 
servations are practically normally distributed.) Table II summarizes these 
results. 
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THE NUMERICAL COMPUTATION OF THE PRODUCT OF CONJUGATE 
IMAGINARY GAMMA FUNCTIONS 


By A. C. CouEn, JR. 


The difference equation 


fe @+orte 


was used by Professor Harry C. Carver [1] as the basis for graduating frequency 
distributions in a manner analogous to the use of the differential equation 


(1) fer _ 2 +ar+e 


ldy_ az 
y dz bo + biz + box? 


in the Pearson system of frequency curves. In order to determine a particular 
j.by Professor Carver’s method it was necessary to perform the complete gradua- 
tion from the lower limit of the range up to and including the required f-. 
When z is large and only isolated values of f, are required it seems desirable to 
have a method for computing f, directly, and the present note seeks to accom- 
plish this purpose. 
It is well known [2] that the difference equation 

(2) Sets a a — a)(% — ag) «++ (@ — On) 

Se (x — Bi)(a — Bs) «++ (% — Bn) 
has the solution 

T(z — a) --- T(t — an) 

(3) Si sagen nero 

. P@ — Bi) --- T@ — By’ 
where w, is a periodic function of z (wz = Wein = --- = k) and T(z + 1) 


for z, a positive real number may be defined in the usual manner by the second 
Euler integral 


(4) re +i) = [ ” teat 
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which obeys the recursion formula 





(5) T(z + 1) = 271 (2). 
When z is a positive integer 
(6) T(z +1) = 2. 






















Equation (1) is seen to be a special case of (2) for n = m = 2 and accordingly, 
the solution may be written as 


_ cr — uw) — a) 
) Ja= Kar — Bs)’ 


where a; and a are roots of 2° + ct + ce = 0 and @, and f» are roots of 
x’ + csr + c = 0. The following simple examples illustrate three special 
cases of this solution. 
I. All a’s and @’s are integers. 
for. _ 2(x* + 92+ 20) 


Jz z? + 52 + 6 
has the solution 


/ por V(x + 40 (x + 5) 
j2=K2 5 ora +3) 
which, with the aid of recursion formula (5) can readily be verified by direct 
substitution. 
II. Either the a’s and/or the §’s are real irrational numbers 


fen _ 2 +5r+6 


fe +341 





has the solution 


‘ian r(x + 2)r(z + 3) 


I scancntapnssiiiae eta Mite aininnscicninipaie 

Tz + 33 — VY5)IT[z + 48 + V5)] 
which, with the aid of the recursion formula (5) can also be verified by direct 
substitution. 

III. Either the a’s and/or the @’s are complex. 


Sen _ a+ 82417 
Je xz? + 10z + 29 
has the solution 
¢ Nie+4+)r(2 + 4 — i) 
T(z + 5 + Q)r(z + 5 — 2)” 
Since the recursion formula (5) is also valid for complex arguments [3], this 


solution can be verified by direct substitution just as in the first two cases. 
The evaluation of f, for a given z in cases I and II involves only computation 


ji 







lirect 
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of quantities of the form I'(z) which can be accomplished through the use of 
existing tables of Gamma Functions for small values of.z and through applica- 
tion of Stirling’s formula for large values of z. Eve'mation of f, in case III, 
however, involves the computation of quantities of the form ['(u + iv)I'(u — ww), 
a problem which seems to have escaped previous attention. The remainder of 
the present discussion will center about this quantity. 

The Gamma Function for a real positive argument has been defined by 
equation (4), but for the present purposes, it is more expedient to use the 
definition 


_ n\n’ 
(8) PG) = Lim +1) --- @ +8) 


which is valid for all values of the complex argument z except at the poles 
(2 = —1;z = —2,etc.). The above definition is equivalent to (4) at all points 
where (4) is valid [3]. 

From equation (8), it immediately follows that T(u + w)I'(u — iv) is a real 
number. In fact, we have 


,; i cas (n!)*n™ 
T(u + w)P(u — wv) = Lim fu? + [(u + 1)? +0} --- (u+n)? +07 


We now develop a formula applicable in evaluating this quantity when u is a 


sufficiently small positive integer. As a consequence of equation (8) it can be 
shown that [3] ' 


(9) T(z)r(i — z) = —— , 


Let z = iv in the above equation and we immediately obtain the result 


(10) rv) F(t») = apt 


er 
When_u is a positive integer, we may write 
(11) Fu + wv) = (U— 1 + w)(u — 2 + We) --- (WT), 
(12) T(u — iv) = (u — 1 — w)(u — 2 — wv)... (—wv) F(-ww). 
The product of (11) by (12) gives 

P(u + w)P(u — iv) = Xe? +1) --- + u — LYTW)F(-w) 


which upon substitution of the value found in Equation (10) for I'(iv)T(—1) 
becomes 


(13) r(u + w)r(u — iv) = es il (v*® + 1°), 


e™* — e-*” 


MALTA BATAMMA B BABE RBADIES 
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To obtain a result that is applicable when wu is not a positive integer, we 
make use of Stirling’s formula for complex arguments. Lipschitz [4] proves 


Log '(z) = log 2m + (z — §) logz 


m Bom+1 1 
ee <e Decry = 


and that the remainder after the mth term is 


(—1)"" Banas 
(2m + 3) (2m + 4) aa 


where «€ < 1; ¢ < 1. Bom: designates the Bernoulli numbers. (Bi = 3; 
B3 = 35; Bs = ds; etc.) We are thus able to write 


Log I'(u + iv) = log (Re) 
(15) = log V/2n + (Re* — 3)(log R + ig) 
—(2m+l)ig 


oa te (—1)” Bom4i é 
-* x 4 (m+ 1)(Qm+2) Re 











(14) 


(e + e’2), 


Val = 








where ¢ = tan” ~ and R = Vutt+ v; 

















Log I'(u — iv) = log r'(Re **) 
(16) = log /2r + (Re * — 3)(log R — ig) 


is -tg (- 1)” Bom+1 e 
auilllie z 4 (im + 1)Qm +2) RT” 


(2m+l)i¢ 


Adding (15) and (16), we obtain 
Log I'(u + iv)I'(u — iv) = log 2x + (ce? +e°*)R log R — log R 
+ Rig(e* — &**) — Ree* +e) 


(-— y” Bom41 (2m+lig —(2m+1)s¢ 1 
" Lanny (m+ 1)Qm +B © te ) ame 


which upon being simplified becomes 
Log I'(u + ww) (u — ww). 
= log 2x + (2u — 1) log R — 2(yv + u) + A(R, ¢), 





(17) 
where 


(18) W(R, ¢) = > See 2 mt cos (2m + 1)¢. 


This result is somewhat similar to that obtained by Karl Pearson [5] in con- 
nection with the evaluation of the G;,,, integrals of his Type IV frequency 
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curve. If R > 1, the expansion of ¥(R, ¢) is asymptotic and the greatest 
numerical value that the mth term can have is 
Bom+t 1 


(2m + 1)(2m + 2) R2-*° 


Thus according to Lipschitz results, the error committed in dropping all terms 


° : - Bam+1 1 
after the mth will not exceed: + (2m + 1)Qm + 2) R=" 


table gives an indication of the size of the error: 





The following 


Terms omitted Error committed in 
after ¥(R, ¢) less than 


Ist + .0833 3333/R 

2nd +.0027 7777/R* 
3rd +.0007 9365/R° 
4th + .0005 9524/R’ 
5th + .0008 4175/R’. 


It is now obvious that formula (18) will give satisfactory results whenever R 
is sufficiently large. The degree of accuracy required together with the value 
of R will determine the number of terms of ¥(R, ¢) to be computed. 

We now turn to the solution of the example under Case-III and proceed to 
calculate fi , fis, and fiso when fo = 29. We may write 


(5 + 22)r(5 — 22) 
T4+ar4—-i- 
Application of formula (13) gives 
T(5 + 27)F(5 — 2%) = 244.043 645, 
r(i4+ i(4 — 7) =. 27.202 292, 
from which, K = 260.171 676, 


K = 29 


T(8 + 2)r(8 — 2) 
r(9 + 21)F(9 — 22)” 
Again making use of formula (13) we have 

22,243,314 
1,020,258,635 
r(i9 + 2)r(19 — 2) 
T(20 + 22)T(20 — 22) 


Since R is fairly large in this instance, formula (17) is used and all terms of 
¥(R, ¢) after the first are dropped. This resylt gives 


log F119 + a)T(I9 — 4) = 31.5892 259, 
log T(20 + 2i)T'(20 — 2i) = 34.0812 782. 


Ss = 260.171 676 


fs = 260.171 676. = 5.6722, 


fis = 260.171 676 
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Accordingly, log fis = 9.9232 071 —10 


and Sis = .8379. = 


By the same method fis is calculated and we find fis = .008723. 

As a check on the accuracy of the results obtained in the above computations, 
values of f, for x ranging from 1 to 15 were computed, using the given oquntion 
as a recursion formula. That is 















7 
f= oh = M7, 





26 
f= 407" = 11.05, ete. 





These results are given in the following table, and it is to be noted that the 
values in the table for f, and fis agree with those previously computed by use 
of formulas contained in this paper. For obvious reasons, no attempt was 
made to compute the value of fis by this method. 


TABLE I 












z So x Se z f(x) 

0 29.0000 5 4.3375 10 1.6228 
1 17.0000 6 3.4200 11 1.3961 
2 11.0500 7 2.7633 12 1.2135 
3 7.7142 8 2.2779 13 1.0644 
4 9 1.9092 0.9411 


















0.8379 
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COMPARISON OF PEARSONIAN APPROXIMATIONS WITH EXACT 
SAMPLING DISTRIBUTIONS OF MEANS AND VARIANCES 
IN SAMPLES FROM POPULATIONS COMPOSED OF 
THE SUMS OF NORMAL POPULATIONS 


By G. A. BAKER 


1. Introduction. Biological and sociological data are often ‘“non-homoge- 
neous” and of such a nature as not to be easily separated into components. 
Non-homogeneous populations have been discussed by Karl Pearson, Charlier, 
and others. Non-normal material has been discussed by many writers. See 
for example, A. E. R. Church [1] and J. M. LeRoux [2] for a discussion of 
moments of the distributions of the means and variances for samples from 
non-normal material. 

In a previous paper [3] the author has given the distributions of the means 
and standard deviations of samples from certain non-homogeneous populations. 
The purpose of the present paper is to extend the results given in [3] and to 
compare the moment approach of the Pearsonian school with the true distri- 
butions. 


2. Moments of the distribution of means of samples of n from a non-homo- 
geneous population. Consider a population with distribution 


oe 1 | tet eo 
(2.1) f(z) i+ye E + ~e 


The first four moments of (2.1) about z = 0 are 


’ km 
v1 


;, + k(o? + m’)] 
[30” + m’] 


1 


1+k 
1 
1+k 
— 
1+k 
l+k 


[3 + k(30* + 6m*o* + m'‘)). 


The means of samples of n drawn at random from (2.1) are distributed 
according to 


a = (”) nisaiiiiieaniin exp ne Z ; ) 
(2.3)  /2n(1 + k)"| 0 \8/ Vsc2t+n—s  “weaw-e 
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Denote by m;, the moments of (2.3) about z = 0 and by m, the moments about 
the mean. Then in view of the relations 












n! s° <i n! 
(n—s)!s! 2 Ar (n—s)!(s—r+a)! 
(2.4) Aw = 1, Ap(r-1) _ i, As _ 3, Au = 6, Aw — z. 


As oa 10, Ase _ 25, Ass — 15, 







and similar relations, and reduction to moments about the mean we obtain 


/ 
Mm = 

















+ gee + eal 
ms = aT | 3k(nk + lot + 8(n + k) + 6(n — Ike” 
+ y+ (a — Dm? + 8% fin = yk + Am? 
_ + qqgp if + Bn - e+ 1}m| 
ms = ac | 15ten — 1)k + 1}mo* — 15 {k + (2n — 1)}m 


+ 30(n — 1)(1 — k)mo’ 


10 e a 2 - 3 2 
+ i+tk (n Ik + 4(n Ik + 1}m'o 


+4 {— KB — 4(n — Dk + (n — 1)}m! 









1 3 2 5 
+ Gap (— B+ (= 10n + 11) + (10n — 1)k-+ 1m. 


The expressions for the first five moments agree with the results given by 
Church and Tchebycheff. 
The betas of (2.4) are 


2 2 [3e?— 3 + Fmt] 
(2.6) i ae i+k 
“OTD areas ; a 
o +k 
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“_— 6 2 6 k—4k+1 ‘| 
k| 3 +36 60 i+k™ Yee” 7 ae m 
2 
2 

,B, vanishes if k = 0, m = 0, ork = lando = 1. If k and o are constant 
and m approaches infinity ,B, approaches (1 — k)’/nk. If k and mare constant 
and o approaches infinity 1B, approaches zero. 1B: — 3 vanishes if k = 0, 
k = ©,orif m = 0 and o = 1. If k and o are constant and m approaches 


(2.7) :Be—3= 


TABLE I 


m, and pm; compared for four sets of values of k, o?, and m 


Sets of values 


k o m 


—1/4 1/4 0.5 


infinity then ,B, — 3 approaches (k* — 4k + 1)/nk. If k and m are constant 
and o approaches infinity then ,B, — 3 approaches 3/nk. 

It is of interest to compare the higher moments of (2.3) with the higher 
moments calculated from the first four moments on the assumption of a Pearson 
curve in place of (2.3). On this assumption 


(2.8) _ 2ms (my + Tem — 3mm) 


_ 9m: — mm, + 3m5 


It is seen that (2.8) bears little resemblance to m;. If we consider the 
difference ,ms — ms we see that it is of the same order in 1/n as is ms and the 
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numerator is of the 16th degree in k, m, and a; a very complicated locus. ms and 
pms are compared for certain values of the parameters of (2.1) in Table I. 
Table I shows that the coefficients of 1/h°* in the expressions for ms and ,m, 
differ by from two to more than 40 per cent. The coefficients of 1/n‘ differ 
even more. The assumption of Karl Pearson’s curves to represent the distri- 
bution of means of samples of n from non-homogeneous populations seems to 
be adequate in some cases but inadequate in others even for moderate values of 
the parameters. 


3. Moments of the distribution of variances. In [3] an estimate of n times 
the standard deviation squared is expressed as 


(n — 










(3.1) W = (n — s) ei + 863 + ———— (™ + m,)’, 








where a bar over a letter means an estimate of the corresponding population 
parameter and where (n — s) denotes the number drawn from the first com- 
ponent of (2.1) and s denotes the number from the second component. 

For the direct calculation of the moments of the distribution of variances 
it is easier not to "se the distribution given in [3], but to proceed as follows. Put 


_2 (n — s)s 


(n — 8) Gi = y, 852 = 2, ———— (ii + ie)” = 2. 










Of course, for population (2.1) o, = 1, 2 = o, m = 0,m2z =m. The variables, 
Z, y, z are all independent in the probability sense and their probability distri- 
butions are well known. Hence the moments of 


(3.2) W_xtytz 


n n 





















can be directly calculated. 
L, For instance, if p = 1 then 


oe 1 2 
(3.3) M,; = —— = y|* +1+ 4m]. 
In general, of course, the moments about the mean check with the values given 
by Church. 


It is generally recommended,to represent the distributions of variances of 
samples from non-normal parents by Pearson’s curves. Let us examine the 
results of this procedure in a special case. 

Suppose that the sampled — is 


(3.4) f(z) = iJ [e~ $z° + eg te-8.0% 


The first eight moments of (3.4) which are needed in the calculation of the first 
four moments of the variances are: 
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, 


V1 1.7000 vs = 0 
Ve 3.8900 V6 


(X24, Loy” 
~ 20.28) 


Pu | | tT tT NT 
CECE 


pet ft | | | he 
Fi - 


CFL 398 x +.003550 sin b (3.4 VIX) +4 
0005454 sin t (6.8 ¥x)] 


Fic. 1. COMPARISON OF THE TRUE DISTRIBUTION OF THE VARIANCES OF SAMPLES OF 4 
DRAWN FROM THE NoON-HOMOGENEOUS POPULATION (3.4) WITH THE CORRESPONDING 
EMPIRICAL PEARSON CURVE 


The first four moments of the variances of samples of 4 from (3.4) are: 
2M; = 2.918 2M; = 4.745 
2M. = 3.396 oM, 41.52. 


Hence 2B, = .60 and 2Be = 3.6, « = —.87 which calls for a type 1 curve. The 
equation of the curve is 


r 2.191 r 15.84 
(3.7) yi = 0.2281 (1 +5 —) (1 — x) 


(3.6) 
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with its origin at its mode. The corresponding true distribution with the 
origin at the beginning of the range is 


y2 = € *[.3989+/z + .003550 sinh (3.4+/3z) 
+ 0005454 sinh (6.8+/z)]. 


Distribution (3.8) differs slightly from the corresponding result given in [3] 
because of an error in that paper. 

The two distributions are compared in Figure 1. It is seen that the two 
distributions are quite different. As the number of components of distributions 
similar to (3.8) increases, which is true as n increases, the distributions may 
be expected to become smoother and more closelv representable by a single 
smooth curve. 


(3.8) 





4. Summary. The moments of the distribution of the means of samples of n 
from a non-homogeneous population composed of two normal components are 
given up to and including the fifth. This fifth moment is compared with the 
fifth moment calculated on the assumption of Pearson’s curves to represent 
the distribution of means. The B’s of the distributions of the means are dis- 
cussed in certain limiting cases. It appears that for small samples and extreme 
values of the parameters, and in some cases of moderate values of the parame- 
ters, the Pearsonian approximations give poor results. 

Some identities involving the binomial coefficients are given which permit 
the reduction of the moments of the distribution of means calculated directly 
to forms given elsewhere [1]. A method is given for the direct calculation of 
the moments of the variances of samples from a non-homogeneous population 
composed of two normal components. An indication of the closeness with 
which a Pearson curve can be made to fit the distribution of variances in small 
samples from a non-homogeneous population is given in Figure 1. 
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A LEAST SQUARES ACCUMULATION THEOREM 
By W. E. Bueicx 


The following simple least squares theorem does not seem to have been men- 
tioned in the literature, and has at least one practical application. 

If A*(x) and B*(z) are polynomials of the same degree which are least squares 
representations of the functions A(z) and B(x) respectively, for the values 
%,%2,%3,-++,2%p, then 


Pp Pp Pp 
(1) 2d A*(z,)B(a:) = 2D A(z,)B*(x,) = dX A*(z,)B*(a,). 
To prove the theorem let 
(2) A*(2) = > aa’ 
+=0 
and 
(3) B*(z) = Do bya’. 
j=0 
Then the normal equations for the determination of a; and b; are 
m p 
(4) > a:8:44 = >> th A(z), k = 0, 1,2, ---,m, 
t=0 t=] 


and 


2, bisitn = > 2 B(2,), h= 0, ‘ 2, coe Nn, 
7 


t=] 


Hence, by (2) and (5) 


¥ Ar@)BG) = X[ X wat | Bed 


t=—1 | é=0 


=a > x} B(z:) 


: 


abs; if n 2m, 
jan) 
A*(z,)B*(z,) if n =m. 
_ 


Similarly it can be shown that 


(7) > A(x) B*(2,) = ‘ A(2)B%e) if men. 
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Combining (6) and (7) we have 
(8) : A*(z,) B(x, = > A(z) BY) = » A*(c) Ba.) if m=n. 


In the particular case A(z) = B(x), equation (8) gives the interesting result 
(9) 2 A*(a)[A(a,) — A*(x)] = 0. 

An obvious extension of equation (6) is 
(10) DY atA*@)BG) = Data) Be), if nzm+a 


where q is a positive integer. 

A practical application of (8) has been made by one large insurance com- 
pany in the casem = n= 1. Suppose that A(z) represents an annual payment 
made xz years ago and is an approximately linear function, and that B(x) repre- 
sents a compound interest function. Then, even if B(z) is not a linear function, 
we may write approximately 


D A@)BG) = Y AGB) 
x 2 A(z) (be + B12) 
= bo . A(z) + bi . 2A (z) 


= by DAG) + Dd x AW. 


th if a year-by-year record i is kept of the annual payments A(z), the sum 
> A(z), and the double sum - . A(y), and if bp and b; are tabulated func- 


z=1 y=z 
tions of p, equation (11) affords a convenient method of evaluating > A(z)B(z) 
z=1 


approximately. 
The author wishes to acknowledge that the case m = n = 1 of equation (8) 
and the above application were brought to his attention by John K. Dyer. 
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